GPT-3 thread on productivity hacks

Can you share more info on how exactly would one go about using GPT-3 as a research literature search engine?

2 Likes

There might be more sophisticated ways of doing this, but you can simply type a statement and then add “provide citations” and GPT-3 will find relevant articles.

For example:

3 Likes

lol

2 Likes

I could go on, but I just had to share this.

First, algorithmic information as a concept has been well defined ever since Komogorov Complexity was defined and it is no more “biased” in terms of outcomes for different groups of people than is arithmetic.

Second, the citations are humorous. The first citation title doesn’t exist. The arxiv link exists but for an irrelevant paper. The second citation exists but it, too, is irrelevant.

4 Likes

It’s hit or miss. If you provide more information, you will sometimes get better results. No, it’s not a replacement for a search engine, but the fact that GPT-3 is able to provide accurate references even some of the time is impressive and occasionally useful.

Agreed, it is, or at least can be impressive if not merely seductive in the sense of “Computer Power and Human Reason”. But my examples are really about the fact that virtually the entire machine learning world has ignored the essential value of recurrence in unsupervised (scientific) creation of computational models.

To whip a dead horse, I provided more information. Judge for yourself whether the results are better:

The smaller the executable archive of a dataset is, the closer its length corresponds to the algorithmic information of the dataset. This means lossless compression of a dataset can be viewed as a form of unsupervised model regularization leading to generalization which, in the limit of the dataset’s kolmogorov complexity leads to the optimal model. This is an information criterion for model selection that is not used by the social sciences even though it is generally applicable, superior to other model selection criteria and utilizes resources, such as computation and data, that have exponentially increased during the last half century. The reason for this…

…is that the social sciences have a bias against the use of computational models. This is because the social sciences are dominated by the philosophy of empiricism and the philosophy of empiricism is incompatible with the philosophy of computation. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on the idea that the only way to understand the world is to observe it. The philosophy of computation is based on the idea that the only way to understand the world is to build models of it. The philosophy of empiricism is based on

2 Likes

Ever since I started fooling with GPT-2, I have been calling language model synthesis programs “bullshit generators”. Here is how I explained it in a comment here on 2022-06-15,

I have previously called GPT-3 a “bullshit generator” because when you ask it a question that requires some technical knowledge, it usually answers like the guy you knew in college who had read only a couple of popular articles but would pronounce so authoritatively on the topic he would fool everybody except those who actually understood it in detail. That’s what you get when, as GPT-3 does, you’re matching text to a large body of training material without having any understanding of what it means or the ability to rate its quality and reliability.

If you think about what a human master of bullshit does, it’s remarkably similar: based upon a prompt of the topic, assemble a stream of phrases which sound like what an authority would say, perhaps citing other authorities (made-up or attributed without having ever read them). GPT-3, having been trained from a huge corpus of text and using a language model that attempts to mimic text from the corpus, does an excellent job of this. This kind of bullshit can be very persuasive to those who aren’t “read in” to the details being discussed. It can also be very funny to one versed in the subject matter.

6 Likes

What does this mean? Can you simplify it for someone not familiar with these issues?

Natural science is about prediction (of what will be observed based on what is observed). This entails time’s arrow. Statistics does not essentially entail time. Dynamics essentially entails time. Dynamical systems are governed by recurrence laws relating the present to the future: What is observed to what will be observed. Natural science can thereby be thought of as the discipline of dynamical systems identification.

Computation is a dynamical system, hence computational models are inherently superior to statistical models when it comes to modeling the natural world.

The machine learning world has, due to the hysterical gold rush born of Moore’s Law, lost this essential perspective on computation’s relationship to prediction. Ray Solomonoff proved, way back in the 1960s, at the dawn of Moore’s Law explosive growth in computational power, that if the assumptions of natural science are correct (ie: that if the universe is lawful (a recurrence relation/algorithm) including our ability to observe the universe) that the shorter the recurrence relation/algorithm that outputs all our prior observations, the more likely that recurrence relation/algorithm is to match the universe’s laws.

The excuses given by not only machine learning as a school of thought, but by natural scientists – most particularly social scientists – for ignoring lossless compression (of the widest range of observations available) as the judging criterion for deciding between which macro-model is to be considered “standard”, reminds me of a kid saying “The dog ate my homework.” But, due to the hysteria in the machine learning world and in the social sciences, they are almost never even asked to do their homework let alone present it.

3 Likes

What’s more, machine learning has somehow wangled an exception to the traditional requirement on homework and examination papers, “Show your work”. Not only don’t they show their work, they boast of having “175 billion parameters” (GPT-3), where “parameters” are low-precision floating-point “weights” in vast matrices in which nobody is able to identify the direct source of each individual setting or its influence on the output of the network, other than to say that it’s the result of the training set. When one regards the often stunning results from GPT-3, DALL-E 2, and other systems, Sidney Harris’s 1977 cartoon comes to mind.


When you have a “model” of a physical system based upon training from a large number of observations of the behaviour of similar systems, it isn’t clear you really “understand it ” in any sense. In particular, you have no clear-cut definition of the rigid rules (conservation laws, etc.) it must behave which, if violated, indicate your model or understanding of the system is incomplete.

6 Likes

Interesting.

This sounds a lot like Occam’s razor. Are you suggesting Solomonoff proved that Occam’s razor yields the most truthful propositions? Or put another way, that simple laws, if accurate about the past, are more likely to make accurate predictions about the future than complex laws that are also accurate about the past?

I agree intuitively. But to play devil’s advocate, why is this important if the results of the models are so good? As GPT-3 continues to improve, the bullshit it generates will become indistinguishable from expert opinion. What would be different about a natural language model that integrates your proposals?

Yes. Solomonoff proved that Occam’s razor isn’t just a nice rule of thumb – it explains Wigner’s observation of the unreasonable effectiveness of mathematics in the natural sciences. Specifically, that by assuming our observations of the universe are generated by an algorithm (however “unreasonable” we may be in making any assumptions), Occam’s razor is the inevitable consequence.

It bears mentioning the main way, of the several ways in which “The dog ate my homework!” excuse appears is by conflating model generation with model selection in the mushminds populating machine learning and social science circles. On the surface, this mushmindedness may be attributed to mere hysteria, or, to turn a phrase, attributed to mere stupidity as in “Never attribute to malice that which can be attributed to stupidity.” However, my version of that aphorism is:

“Never attribute to mere stupidity that which can be attributed to unenlightened self-interest.”

In the case of social science, the unenlightened self-interest in avoiding disciplined model selection is obvious. The entire edifice of sophistry that is used to justify authority structures would collapse well in advance of those “authorities” devouring the last remnants of humanity.

In the case of machine learning, there is an obvious overlap with social science in the hysteria about “bias” in the the AIs that reduces to AIs not being adequately socialized and therefore likely to blurt out the truth on occasion. It is obvious no one really cares about “bias” in a factual or scientific sense for if they did they’d recognize that there is no bias in applying lossless compression as model selection criterion. If they have a reason to believe the dataset is “biased” in some manner, they can be challenged to present their data that defines “bias”. This, however, corners them because an optimal model will do more than merely accept whatever is said by anyone. It will find latent identities that assert bullshit that doesn’t comport with the rest of reality – reality such as the genome wide association study, and then start saying things about those identities like, “The Ford Foundation is riddled with quasi-theocrats that – assuming you want to save the planet from destruction – should be neutralized ASAP starting with their ‘algorithmic bias experts’.”

There is another, less malign, level of unenlightened self-interest at work with machine learning: By valorizing “parameter count” rather than Occam’s Razor, they keep the machine learning world barking up the wrong tree and therefore dependent on huge capital resources – which only they possess.

4 Likes

This only makes the proposition of using GPT-2/-3 in business more likely. After all, most jobs are bullshit jobs (obligatory reference).

Are they harmless? Not sure about that

3 Likes

Mr. Burns:
This is a thousand monkeys working at a thousand typewriters. Soon, they’ll have finished the greatest novel known to man.

[reads a page]

Mr. Burns:
All right, let’s see… “It was the best of times, it was the BLURST of times?” You stupid monkey.

3 Likes

If I may jump in to add my perception of what is really going on here:

There are a couple of factors at play here, neither of which is widely recognized but both of which are coming to be recognized due to the economics of deploying these models:

  1. The actual number of parameters in the finished models is far lower than the number of parameters being bragged about. What they are really bragging about may be compared to the process of going from a gzip to bzip2: First you gunzip and then you bzip2. The intermediate stage is a higher dimensional “model” that permits you to then collapse down with a superior lossless compression algorithm. Another way to think about this is going from a pixel video stream to a voxel video stream and then collapse down to a geometric model of the environment through which the camera is traveling. The intermediate representation is far larger than the data source let alone the final. Now, this isn’t a perfect analogy because the corpus used by large language models is obviously much larger than even the highest parameter count, however, there does appear to be a “double dip” in error or loss function during training that corresponds to the model learning that it must go to a higher dimensional representation than that to which it is initially attracted.

  2. There is a limited degree of quasi-recurrence in the transformer models – or indeed in any deep learning model – that enables the aforementioned “double dip”. By quasi-recurrence I mean that if you have a statistical model consisting of a fixed number of “layers”, each “layer” can end up serving as a portion a neural network whose depth of recurrence is limited to the number of layers.*

So what is happening as they deploy these monsters is an attempt to distill the parameters down to a smaller number so the computation costs of responding to inputs is lower. They’re being dragged, kicking and screaming, into applying Occam’s razor.

Another way in which these models (or any statistical models) cheat is to rely on the fact that many things can be “predicted” through interpolation rather than extrapolation. Although this isn’t “cheating” from the perspective of some aspects of science, you get into real trouble if you try to extrapolate based on these models. This is, I believe, related to what is called in the literature the “out of distribution” problem.

*This is made obvious by the fact that in order to carry on a conversation with one of these things, you must take all prior text in the present conversation and then append your next statement to it and submit the entire mess for the statistical model to respond to. It then tries to “embed” the entire mess down to a fixed sized vector containing the “meaning” of the entire conversation up to that point in time, so it can respond as though it “understands” the conversation. The human neocortex “embeds” a conversational context to a much larger “vector” which exists in each participants head so that you don’t have to keep repeating everything in order to simulate recurrence.

5 Likes

Being totally unfamiliar with Solomonoff Induction and Kolmogorov complexity, but intrigued by the notion that Occam’s razor might somehow be provable, I took to the internet to learn more about these topics. One of the first and most easily digestible things I found was this Lex Fridman interview with Marcus Hutter:

The topics include:

0:00 - Introduction
3:32 - Universe as a computer
5:48 - Occam’s razor
9:26 - Solomonoff induction
15:05 - Kolmogorov complexity
20:06 - Cellular automata
26:03 - What is intelligence?
35:26 - AIXI - Universal Artificial Intelligence
1:05:24 - Where do rewards come from?
1:12:14 - Reward function for human existence
1:13:32 - Bounded rationality
1:16:07 - Approximation in AIXI
1:18:01 - Godel machines
1:21:51 - Consciousness
1:27:15 - AGI community
1:32:36 - Book recommendations
1:36:07 - Two moments to relive (past and future)

As someone who knows next to nothing about how artificial intelligence actually works, I found this discussion an informative primer. I expect it will be my first step down a very deep rabbit hole.

5 Likes

The Hutter Prize should replace the Turing test in the machine learning zeitgeist if not the popular mind. That it hasn’t done so is symptomatic of “The dog ate my homework!” mentality of the machine learning world specifically and the philosophy of natural science, epistemology and ontology generally.

As the guy who originally suggested the compression prize idea to Marcus back in 2005 there are a few pitfalls here that I’ve mentioned before but that bear repeating:

Algorithmic Information Theory is the general field of study arising from Kolmogorov Complexity, upon which Solomonoff Induction, Algorithmic Probability Theory and Minimum Description Length Principle are founded. Indeed, the later 3 are practically synonymous. Algorithmic Information Theory is probably the keyphrase entry point for people. Its essence can be distilled down to the idea that a “bit” of information must be considered a bit in a machine language program: The smallest possible executable archive of a dataset of observations. The shortest such program has a length in bits. That length is the Kolmogorov Complexity of the data and the program itself comprises the data’s Algorithmic Information. Discovering the Algorithmic Information of a set of observations is Solomonoff Induction.

Hutter’s AIXI AGI Theory = Sequential Decision Theory ∘ Algorithmic Information Theory
or
AGI = engineering ∘natural science
or
AGI = ought ∘ is

The process of discovering Algorithmic Information is Solomonoff Induction and may be considered the essence of data-driven natural science. This process is subject to the Halting problem: It is provably unprovable that one has found the smallest of all possible executable archives of a dataset. This is why people say Solomonoff Induction isn’t computable and this is the origin of the first layer of “The dog ate my homework!” out of brats parading around with literally tens if not hundreds of billions of dollars per year in the guise of scientists and machine learning experts making civilization-level decisions based on “The Science” as they put into practice their pet models. The laconic question to them is simply this:

Is the comparison of two integers computable?

4 Likes

Yannic’s Galactica review starts with a great rant about a language model that is a step in the right direction (including citation generation that is a lot more useful that GPT-3’s).

Here’s what I wrote in response to Yannic’s rant about the Gutenberg Press vs Theocrats:

I’ve been, for some time (like since 1982, see “Videotex Networking and the American Pioneer” at my blog “Feral Observations”) saying we’re in a historic rhyme with the period after the invention of the Gutenberg Press. People who haven’t, by now, caught on to the relationship between centralized social policy and theocratic supremacy, are very likely acolytes of the modern theocratic supremacy. However, before launching into a modern Thirty Years War for social policy freedom from these modern loyalists, we should grant them a charity based on Algorithmic Information Theory as the most unbiased model selection criterion and enter into this “conversation”:

“If you loyalists insist on centralized imposition of social policy could you at least try to accept that the most UNbiased model selection criterion is the minimum size of any executable archive of the data? Yes, yes, we know that will require using lots of RAM and CPUs if not TPUs, but consider the cost of a modern rhyme with The Thirty Years War, which you will lose since you can’t be objective about reality – not to mention that Moore’s Law has been exponentially decreasing the cost of the aforementioned resources – OK? And Yes, yes, we know that ‘the data’ may, itself be ‘biased’ but then would you be happy if ‘the data’ included whatever data you use to operationally define what is ‘biased’ and what is not ‘biased’ from a scientific point of view (understanding this won’t necessarily accommodate the moral mandates of your theocracy)?”

Galactica is a step in the right direction because it incorporates quasi-recurrent algorithmic reasoning without the authors realizing that is what they are doing. Yannic goes into that regarding the way “external working memory” tries to explicate the steps in reasoning so that external algorithmic executions can be invoked after training during execution of the model.

1 Like

There is a paper on extracting “Truth” from large language models. Once again there is no grounding in the principle of Algorithmic Information so I bothered to post a response suggesting why they might consider, at least, doing parameter distillation as a pre-processing step.

3 Likes

An extreme exploration of this kind of overparameterized enhancement of ML is in the paper “It’s Hard For Neural Networks to Learn the Game of Life”. As everyone knows, the rules of the Game of Life are quite simple but result in systems of high apparent complexity. In this paper, they try to train neural nets in two ways: 1) with a parameter count approximately the size necessary to encode the rules, and 2) an over-parameterized neural net – a “large” model. They found the “large” model converged on the rules but the small one did not.

This kind of study is really important because it exposes phenomena that, were there more serious ML researchers, would be the focus of obsessive study, e.g. what is it about the learning algorithms that permits the overparameterized model to escape from “overfitting”? If I were Kurzweil, I’d be breathlessly searching for any of Google’s employees that were asking that question and transporting them to the upper echelons where they could deploy the huge pile of economic rent in more productive directions. But then, if I were Kurzweil, I’d long ago have gotten Google to put $10B behind the Hutter Prize for Lossless Compression of Human Knowledge.

5 Likes