Kaido Orav's fx-cmix Wins 6911€ Hutter Prize Award!

Kaido Orav has just improved 1.38% on the Hutter Prize for Lossless Compression of Human Knowledge with his “fx-cmix” entry.

The Hutter Prize winners have, since 2006, “predicted the next token” as the basis of language modeling, many years before predicting the next token was cool. Although, to be fair, The Hutter Prize doesn’t restrict winners to mere “next token” prediction.

After all, scientists are free to repeatedly pour over their datasets, “compressing” them into world models. They are only restricted to next observation prediction when designing experiments to test their models! But it is a good idea to select the best model when designing an experiment, just as it is when engineering a technology.

The Hutter Prize uses the size of executable archive of the data as an approximation of the most principle information criterion for model selection:

Algorithmic Information

Unlike the menagerie of less principled statistical information criteria, Algorithmic Information has been known since 1964 to be the least-biased approach to the natural sciences relative to a given selection of data. Since Wikipedia embodies wide-ranging knowledge encoded as language data, it was The Hutter Prize’s selection of data.

The Hutter Prize is is a scientific research prize. The sibling benchmark for technology is Matt Mahoney’s Large Text Compression Benchmark, which (unlike the Hutter Prize) has no resource constraints. The general purpose CPU constraint on the Hutter Prize is there, first and foremost, to avoid what Sara Hooker has described as “The Hardware Lottery”: Existing technological infrastructure may disfavor radical scientific discoveries that could otherwise point the way to new and better techniques. General purpose CPUs introduce less bias in research precisely because they are general purpose.


One of the more exasperating things about promoting the Hutter Prize – especially in places like ycombinator which has the imprimatur of Pope Sam “Gibz $7T” Altman – is the claim that large language models are evidence that achieving “AGI” requires orders of magnitude more data than the 1GB Wikipedia snapshot of the Hutter Prize.

Aside from the fact Hutter is widely recognized as the foremost authority on the rigorous definition of what “AGI” means in mathematical theory, there is the implication that the “throw everything including the kitchen sink at the learning algorithm” approach can achieve their less principled notions of “AGI”.

OK, fine.

So you, dear pseudonym-created-for-this-particular-exchange-only-to-disappear-once-youve-damaged-the-world, are certain the Hutter Prize is rendered worthless by <insert specious arguments>, right?

What would you consider an event that could change your mind?

At least then we can discuss your getting an insurance company to weigh in with a bet against the Hutter Prize’s value, in a manner not unlike that used to underwrite the Ansari X-Prize.

But then, of course, pseudonym-created-for-this-particular-exchange-only-to-disappear-once-youve-damaged-the-world disappears.