GPT-3 thread on productivity hacks

The Hutter Prize should replace the Turing test in the machine learning zeitgeist if not the popular mind. That it hasn’t done so is symptomatic of “The dog ate my homework!” mentality of the machine learning world specifically and the philosophy of natural science, epistemology and ontology generally.

As the guy who originally suggested the compression prize idea to Marcus back in 2005 there are a few pitfalls here that I’ve mentioned before but that bear repeating:

Algorithmic Information Theory is the general field of study arising from Kolmogorov Complexity, upon which Solomonoff Induction, Algorithmic Probability Theory and Minimum Description Length Principle are founded. Indeed, the later 3 are practically synonymous. Algorithmic Information Theory is probably the keyphrase entry point for people. Its essence can be distilled down to the idea that a “bit” of information must be considered a bit in a machine language program: The smallest possible executable archive of a dataset of observations. The shortest such program has a length in bits. That length is the Kolmogorov Complexity of the data and the program itself comprises the data’s Algorithmic Information. Discovering the Algorithmic Information of a set of observations is Solomonoff Induction.

Hutter’s AIXI AGI Theory = Sequential Decision Theory ∘ Algorithmic Information Theory
or
AGI = engineering ∘natural science
or
AGI = ought ∘ is

The process of discovering Algorithmic Information is Solomonoff Induction and may be considered the essence of data-driven natural science. This process is subject to the Halting problem: It is provably unprovable that one has found the smallest of all possible executable archives of a dataset. This is why people say Solomonoff Induction isn’t computable and this is the origin of the first layer of “The dog ate my homework!” out of brats parading around with literally tens if not hundreds of billions of dollars per year in the guise of scientists and machine learning experts making civilization-level decisions based on “The Science” as they put into practice their pet models. The laconic question to them is simply this:

Is the comparison of two integers computable?

4 Likes

Yannic’s Galactica review starts with a great rant about a language model that is a step in the right direction (including citation generation that is a lot more useful that GPT-3’s).

Here’s what I wrote in response to Yannic’s rant about the Gutenberg Press vs Theocrats:

I’ve been, for some time (like since 1982, see “Videotex Networking and the American Pioneer” at my blog “Feral Observations”) saying we’re in a historic rhyme with the period after the invention of the Gutenberg Press. People who haven’t, by now, caught on to the relationship between centralized social policy and theocratic supremacy, are very likely acolytes of the modern theocratic supremacy. However, before launching into a modern Thirty Years War for social policy freedom from these modern loyalists, we should grant them a charity based on Algorithmic Information Theory as the most unbiased model selection criterion and enter into this “conversation”:

“If you loyalists insist on centralized imposition of social policy could you at least try to accept that the most UNbiased model selection criterion is the minimum size of any executable archive of the data? Yes, yes, we know that will require using lots of RAM and CPUs if not TPUs, but consider the cost of a modern rhyme with The Thirty Years War, which you will lose since you can’t be objective about reality – not to mention that Moore’s Law has been exponentially decreasing the cost of the aforementioned resources – OK? And Yes, yes, we know that ‘the data’ may, itself be ‘biased’ but then would you be happy if ‘the data’ included whatever data you use to operationally define what is ‘biased’ and what is not ‘biased’ from a scientific point of view (understanding this won’t necessarily accommodate the moral mandates of your theocracy)?”

Galactica is a step in the right direction because it incorporates quasi-recurrent algorithmic reasoning without the authors realizing that is what they are doing. Yannic goes into that regarding the way “external working memory” tries to explicate the steps in reasoning so that external algorithmic executions can be invoked after training during execution of the model.

1 Like

There is a paper on extracting “Truth” from large language models. Once again there is no grounding in the principle of Algorithmic Information so I bothered to post a response suggesting why they might consider, at least, doing parameter distillation as a pre-processing step.

3 Likes

An extreme exploration of this kind of overparameterized enhancement of ML is in the paper “It’s Hard For Neural Networks to Learn the Game of Life”. As everyone knows, the rules of the Game of Life are quite simple but result in systems of high apparent complexity. In this paper, they try to train neural nets in two ways: 1) with a parameter count approximately the size necessary to encode the rules, and 2) an over-parameterized neural net – a “large” model. They found the “large” model converged on the rules but the small one did not.

This kind of study is really important because it exposes phenomena that, were there more serious ML researchers, would be the focus of obsessive study, e.g. what is it about the learning algorithms that permits the overparameterized model to escape from “overfitting”? If I were Kurzweil, I’d be breathlessly searching for any of Google’s employees that were asking that question and transporting them to the upper echelons where they could deploy the huge pile of economic rent in more productive directions. But then, if I were Kurzweil, I’d long ago have gotten Google to put $10B behind the Hutter Prize for Lossless Compression of Human Knowledge.

5 Likes