Kaido Orav's fx-cmix Wins 6911€ Hutter Prize Award!

The present trend in ML is to try to make up for lack of lossless compression of the training data by expanding the poorly compressed training data, continuing a trunk query to an exponential tree of thought with cross-checks to try to find any coherence.

This means the demand for inference time hardware like Positron will increase – possibly catastrophically – because people aren’t bothering to recognize that maximizing lossless compression has exponential downstream benefits by getting the world model right before drawing conclusions (inferences) about it.

“An ounce of {training|lossless compression|occam’s razor|scientific theory} is worth a pound {inference with cross-checks|tree of thought|introspective self-criticism|experimental engineering|etc…}.”

Of course that aphorism too has its limits as evidenced by the over-reliance on theory in some fields and tendency to shoe-horn observations into existing theory if not dismiss them out of confirmation bias.

In any event, I think what a lot of people are thinking will be an increased demand for NVIDIA due to Jevons Paradox will, instead, show up in hardware specialized for inference so that people can pursue the latest fad without getting back to the hard work of truth seeking in their world model.

1 Like

You are making a good case for the value of compression, but the weights in the existing LLMs have already done a huge amount of work. There’s not going to be anything catastrophic - unless someone starts building armies of remote-controlled zombies. The case of zizians shows that this isn’t just a theoretical concern.

2 Likes

Is there a principle underlying these “reasoning” models? By “principle” I mean something commensurable with algorithmic information theory’s “top down” approach to model creation. An example would be something like a theoretic relation between the departure from the Kolmogorov Complexity size model and how fast it departs from the optimal Algorithmic Probability Distribution of the data.

1 Like

It’s easy to become trapped inside the computational machine and lose sight of the machine merely supporting life inside a bigger system with a bit of computational intelligence. The computational concerns are somewhat trivial compared to the external ones.

1 Like

I’m not sure what you’re referring to here unless it is my apparent over-emphasis on theory to the exclusion of wide societal implications of engineering.

That would be most uncharitable of you given my often stated motive is to draw attention to the global economy as an unfriendly AGI that is increasingly turking humans into its NPC functional units.

THAT is the real problem with “a bigger system”. Once it starts turking humans, you are dealing with not just “a bit” of “computational intelligence” but a vast sea of computational intelligence with a decreasing remnant of humanity.

Of course, the global economy AGI is nothing compared to the rest of the solar system let alone galaxy let alone universe.

1 Like

I think we agree on the problem, and I also don’t think that you’d consider algorithmic probability some kind of panacea. I merely pointed out that ultimately, Wikipedia tries to summarize data & experience which are primary and should be kept around instead of relying on summaries.

Land also draws the connection between global economy and AGI, but haven’t dug into it: https://retrochronic.com/

2 Likes

Imagine an NPC like Isadore Singer, occupying the heights of the NSF under Reagan thinking he should socially engineer the United States with a fraudulent economics paper so as to ensure that engineers and scientists struggling against the tsunami born of sexual tectonics, can’t afford to form families, and the US turns into a third world shithole country:

(As soon as I posted that link the account that posted the following video was thrown down the memory hole, so that “fools” can have “mere coincidence” plausible deniability)

I’m sure when he engaged in this deception he thought he was doing “good”, as he engaged in self-deception about the justifying economics paper by Myles Boylan.

Now, consider Myles Boylan’s role in this:

He’s put in a position where he knows the conclusion of the economics paper expected by his superiors. He’s being asked to provide a paper that supports the conclusion that will lower labor costs in the near term so his superiors have more of that sweet sweet cheap inferior labor Fentanyl the Maoists want to use to bring down the West by corrupting capitalists hence capitalism.

My emphasis on computational intelligence is as a kind of reductio ad absurdum approach to wake up these NPCs to their self-deception. Automate Myles Boylan so that he can’t provide Isadore Singer, hence his superiors such as Erich Bloch, the self-deception they demand of him – demand with plausible deniability even to themselves.

These NPCs of the global economy are monkeys that want to maintain their appearance of virtue and superiority to maintain their primate hierarchy status and they don’t want to know that’s what they’re doing.

So don’t get hung up on my emphasis on “Wikipedia”. That’s merely a tactic to get around the fact that I can’t go straight to “The Foundation World Model That Might Have Been” because ALL of the positions of influence in ALL of the organizations supposedly providing us with guidance for the future, are occupied by these goddamn fucking monkey NPCs. They smell positions of status and influence and go straight for the jugular of humanity.

My emphasis on Wikipedia as a corpus to demonstrate the AIC as model selection criterion (thereby bypassing the Myles Boylan’s of the world) is simply because of my priority with the Hutter Prize for Lossless Compression of Human Knowledge as a way of demonstrating the principle of the AIC. Once demonstrated, I’m hoping (and of course you are free to call me a damn fool) that some innocent starry eyed economist or sociologist with an interest in machine learning will see it as the next step beyond Many Analysts, One Dataset – thinking that his real job isn’t to create lies for his superiors.

2 Likes

Interesting, thanks for this commentary. Indeed, LLMs could plausibly put all the information regurgitators out of business.

In the meantime, I’m curious about your opinion about this:

Is this technology going to bring us closer to truth, or will it make it even easier to dupe people?

4 Likes

Bear in mind the difference between science and engineering. The NPCs doing the social engineering believe themselves to be engaged in “science” and therefore to be in possession of “the truth.”. They feel morally justified in creating indefatigable persuasion machines because they believe themselves to be in possession of “the truth”. This is an ego structure that can be attacked.

It is a race against time between advancing the scientific method, which strikes at the foundation of that ego structure, and the application of world models that are just “good enough” to permit their zealous social engineering to destroy humanity in service of their NPC monkey brains. This is the dangerous period in which we live where you are quite correct that there are practical applications of the existing “good enough” models.

3 Likes

For example, consider this page: Silencing Science Tracker | Sabin Center for Climate Change Law

They’re dishonest: there’s so much science being silenced that they’re not tracking – and through their dishonesty they’re hurting science as a whole.

Very much so. What are we to do now?

2 Likes

Why did even the “Based” Republican leaders going back at least to Reagan have to subvert the US’s overwhelming lead in the information industry to the point that even the world’s richest man now has to self-destruct rather than upset his Indian parasites? People think I’m crazy to posit Mao’s Revenge, but that’s because they have no idea how espionage works and how the Chinese mind works.

1 Like

Here is a guy to take seriously about violating the scaling laws because he’s creating a theory of grokking.

2 Likes

MechaHitler
Or
The algorithmic information criterion.

The choice is Elon’s.

1 Like

Grok 4 introduction by Elon Livestream
https://x.com/i/broadcasts/1lDGLzplWnyxm

Of the 10x per generation training compute increase Elon says:

“… in some ways a little terrifying…”

Yeah, well, what’s terrifying to me is that “the world’s richest man” can’t understand that replacing the 16th amendment with a single tax on liquidation value of net assets would have made him 4x richer, reduced the national debt in half and potentially could have minted thousands if not millions of Elons worldwide. What’s terrifying is that he’s not that dumb on his own. What’s terrifying is the brain parasite(s) that have dumbed him down to ignore the guy who was a principle in getting the first law passed to block NASA from competing with private sector launch services and as a result recommended the above reform of the political economy. What’s terrifying to me is that those same brain parasites are undoubtedly influencing xAI’s exponentiating investment in Grok.

1 Like

It’s worth noting that when using these LLMs as coding assistants to model the Laboratory of the Counties dataset, whenever I depart from standard machine learning and standard statistical inference of causality to, instead, go to dynamical systems identification utilizing explicit approximations of the ALgorithmic Information Criterion for dynamical macrosocial model selection, they go apeshit with excitement. This is regardless of which of them I use as coding assistant.

Merely the most recent example:

Other times they start throwing emoticons at me.

“about 2016”…

But about a decade earlier:

My motivation for a proposing what became the Hutter prize for lossless compression of human knowledge, using a snapshot of Wikipedia, was to engage the best minds of machine learning in performing what I now call forensic epistemology.

It turned out that large language models are essentially performing the same kind of next bit or word or token prediction originally intended. However they are doing lossy compression rather than lossless compression. To get serious about forensic epistemology you have to go lossless. That’s why limiting the data to 1 GB of Wikipedia is enabling.

PS: It’s probably worth mentioning that I don’t have a Wikipedia page in part because when it came up I insisted that I wasn’t a “notable person” – which the editors immediately latched onto so as to quash its creation. For similar reasons I turned down a NYT interview article that was supposed to be about my role in creating “Spasim” (which does have a WP page). I know I don’t have the legal muscle to defend myself against libel and none of the supposed conservative trial lawyers would get near me. Look at what the NYT did to Congressman King… the most prescient member of the US House of Representative on immigration and just before his views became MAGA mainstream.

2 Likes

Tokenizers are an interesting form of compression too, and Claude is such a high-performing setup that it’s worth looking into it:

1 Like

Yep!

Back in 2016/2017 I was kind of gropping around in this area.

“It is quite surprising that smaller networks are better”

heh

2 Likes