The C-Prize Foundation

The natural sciences are increasingly hobbled by political controversy. This is an existential threat. The C-Prize Foundation would enable progress in the natural sciences currently hobbled by controversy. By applying the practical lessons learned with the Hutter Prize over the last 17 years The C-Prize Foundation would automate judging of prize awards in areas of natural science where controversy currently overwhelms the “conversation”, .

18 years after the aforelinked mention of “The C-Prize” back in 2005, the time is ripe:

Although my suggestion to Musk involved radical reform of language modeling ethics, if not a radical reform of the entire field of ML/AI/AGI, The C-Prize Foundation would host prizes in any besieged natural science, not just language:

Invite influencers on all sides of any controversy to submit the data that supports their position, and then challenge all sides (and anyone else) to create a smaller executable archive of the total dataset, with incremental monetary rewards for doing so. Contestants pay for the execution of their archives and judgement done automatically:

the output matches the contest’s dataset
the archive’s size is smaller than the benchmark set by the last prize winner
pay out (perhaps proportional to the reduction in the loss function, aka size).

In addition to being attractive to the cryptocurrency community via smart contracts, this Radically Reduces the Argument Surface* of the scientific controversy

Yes, there will always be controversy over the inclusion criteria for data, and this is where the rascals will attempt to work their mischief. The rascals have one big problem that will be virtually impossible to overcome without exposing their intellectual bankruptcy to what might be called “The Center” (ie: those who suspect the Emperor Has No Clothes but have much to lose), indicated by point 1 of my above suggestion to Musk:

Modeling the bias itself as bias. This decreases the size of the total executable archive by compressing the biased portion of the dataset to the extent that it possesses within it what the vernacular calls “half-truths”, admitting its canonical knowledge after separating out the bias as a submodel.

Let’s take an extreme example that might get people interested in physics-related controversies such as climate change:

Consider a company with a particularly perverse bias that, for inscrutable reasons, produces thermometers reporting the STP boiling point of water to be 101C – leaving all other substances accurately measured. Even if they were the sole source of thermometers in the world, obtaining observations of other physical phenomena, and subjecting them all to compression, would result in that peculiar case sticking out like a sore thumb so as to have a coherent set of physical laws that maximally compresses the set of all observations:

Canonical knowledge nuanced with the explicit, and politically informative submodel of the thermometer company’s bias.

In more formal, algorithmic information theoretic, terms, diversity of data sources permits us to talk about what might be called conditional compressibility or even more formally, conditional Kolmogorov Complexity:


The size of the smallest program required to create string X given input Y.

Although the aforelinked paper on conditional Kolmogorov Complexity applies it to rigorous causal inference, and this gets to the heart of almost all political controversies involving natural science, it also permits us to quantify scientific bias (as opposed to the moral zeitgeist’s notion of bias) as an explicit cause of data corruption.

Superior models better-compress the total dataset without loss of data, therefore the money going to incremental reductions in the size of the executable archive is guaranteed to be money well spent toward its stated purpose: model quality. Moreover, as lossless compression approaches an asymptotically hard limit in the incomputable Kolmogorov Complexity of the data, there is an equally hard limit on the payout, which can be estimated in a manner analogous to “proof of work” in blockchains:

The improvement per time is an empirical curve that is guaranteed to slow to a stop at the Kolmogorov Complexity. This ratio can then be used to ramp up the prize purse with time, either automatically or by people weighing in with contributions in a manner similar to Patreon’s periodic subscriptions.

There are extensions to this notion of automated judging of lossless compression that have equally radical implications for cyber-security where the notion of “The Attack Surface” hence the analogy to “The Argument Surface” originated. Let’s take, for example, a project like Ubuntu where, instead of a standard dataset, there is a standard install image including not just the Linux kernel, but a suite of applications in competition with Microsoft, such as LibreOffice:

the the installation image expands into a working system that passes the automated test suite
the image size is smaller than the benchmark set by the last prize winner
pay out (perhaps proportional to the reduction in the loss function, aka size).

Note the correspondence here between software as a model for the various use cases toward which it is targeted, and a scientific model’s target being to account for all the data in evidence without “loss”.

It might be objected that, for obvious reasons, the rascals will refuse to participate in such a fair and objective model selection contest for the same reasons medieval theocrats found protocols for experimental controls published by the Gutenberg press threatening to their scribbled control of causal narratives – hence the C-Prize Foundation is a “nonstarter”.

This ignores one thing that is of utmost importance:

There is a substantial overlap between the cryptocurrency community and those of us who just want the objective truth – not so we can lord it over the rascals, but precisely to the extent that we aren’t rascals! The rascals, however, since they cannot abide not lording their sophistry over us, will see us departing from their “conversation” and refining our access to truth based on a revolution in the philosophy of science suited to the information age.

* It is certainly no coincidence that Nick Szabo – original thinker in cryptocurrency and smart contracts – coined the phrase “Argument Surface” regarding cryptocurrency “governance”.