The Foundation World Model That Might Have Been

Proposal for Foundation World Modeling To Expose, Hence Mitigate Power Concentration Dynamics

Introduction

We propose to provide a critical component of any effort to mitigate
centralization of power: A Foundation World Model (FWM). Unlike prior
efforts such as World 3[World3], FWM will be the result of offering
sociology a financial incentive, treating it as a learning system that
incorporates humans as well as machines, subject to the most principled
loss function: The Algorithmic Information Criterion[AIC] (AIC, not to
be confused with the less principled Akaike Information
Criterion[Akaike]) for model selection. The AIC is the most
principled loss function due to its foundation in Algorithmic
Information Theory’s proof as the gold standard information criterion
for inductive inference. Large language models have demonstrated
foundation models are superior to specialized models in specialized
task benchmarks[Found]. Indeed, the phrase “world model”[LLMWM] has
come to characterize the reason foundation models are superior in
specialized tasks: They account for more variables. By contrast,
sociology focused on specialized models, i.e. the prediction of
so-called “dependent variables”[Depend]. In the absence of a coherent
incentive, sociology’s unrealized potential of Moore’s Law in the age of
Big Data produced dysfunction, such as “p-hacking”[phack]. Attempts to
address this dysfunction with replication studies involving multiple
analysts[MultiAn] presented with the same data to compare analysis
methods has foundered on the lack of a principled information criterion
for dynamical model selection. That the AIC is ideally suited for
the age of Moore’s Law and Big Data in such multi-analyst replication
studies has escaped sociology. While this remains in mere potential
civilization careens blindly toward a potential global catastrophe, one
variable of which is centralization of power. The FWM will open our eyes
to the road ahead.

Background

Incentives

An obvious contributing factor to institutional dysfunction is poorly
engineered incentives. Incentives operate at the level of social
dynamics. Engineering depends on science. Proper engineering of
incentives depends on sociology. When sociology itself suffers from
poorly engineered incentives, it becomes a meta-institutional failure.
Scientific fields, such as sociology, produce models. Engineers select
from among models based on utility. One aspect of utility is predictive
accuracy. Another is computational cost. Moore’s Law reduced
computational cost. To the extent that scientists respond to financial
incentives, there is a role for prize awards that exploit reduced
computational costs to remediate some of sociology’s meta-institutional
failures.

The Algorithmic Information Criterion For Model Selection

Information criteria for model selection arose as a means of formalizing
Occam’s Razor so as to avoid so-called “over-fitting” data-driven
models. The biodiversity inventory of information criteria for model
selection is long and growing due to a lack of meta-selection criteria
in statistics. This burgeoning zoo is largely because statisticians
over-emphasize static models. Static models do not involve feedback
dynamics – they are feed forward only. Algorithms are essential to
dynamics. They apply static rules to a present state to generate the
next state. If an algorithm is a model, the next state is a prediction
based on the present state. By induction this becomes a time series
prediction
. In 1964, Raymond Solomonoff published a mathematical proof
that the best possible in any empirical science is the smallest
algorithm (measured in bits) that, when executed, generates all observed
phenomena encoded as data (also measured in bits). The critical
difference is that algorithmic “bits” are dynamic while phenomenal
“bits” (aka “data”) are static.

Solomonoff’s proof of inductive inference is often dismissed as
practically irrelevant because it is “incomputable” but this is
specious. Progress in science is not dismissed simply because it is
impossible to prove the best current model is the best possible
model. Yet it is precisely such specious pedantry that has held back
application of the Algorithmic Information Criterion (AIC) for model
selection. The best current model is the* currently *shortest
executable archive of All Data Under Consideration (ADUC).

The title of the paper “The Unreasonable Effectiveness of Mathematics in
the Natural Sciences” by Eugene Wigner[Math] embodies a less pernicious
critique of the AIC: “The AIC is not justified in assuming arithmetic
can model the empirical world.” Usually appearing in more gilded terms
(e.g. “the arbitrary choice of Turing machine” etc.) it boils down to
a claim that mathematicians are not justified in seeking a minimum set
of axioms for arithmetic. (The connection between axiomatic arithmetic,
of the kind used in the natural sciences, and computation was used by
Godel in his famous series of proofs.[Godel]) Even though this critique
is more easily dismissed, it is perhaps more widely deployed as an
excuse to avoid the AIC. Interestingly, those deploying it are
frequently institutionally insulated from accountability to
counter-argument by the increasing centralization of power: An
apparent conflict of interest.

ADUC: Data Selection vs Model Selection

A penultimate refuge of the anti-scientific scoundrel is to conflate
data selection with model selection. Data differs from its model. Data
selection criteria are different from model selection criteria. Data
selection is subjective. A model is relative to All Data Under
Consideration (ADUC). ADUC is subject to the value that scientific
peers place on data. In this respect the model is inescapably
subjective. However, this does not render the AIC, itself, any less
objective. AIC remains optimal and ruthlessly objective as model
selection subject only to data selection.

The Hutter Prize For Lossless Compression Of Human Knowledge

We have 18 years of practical experience operating a similar prize where
the ADUC is a text corpus.

Since 2006 Marcus Hutter has paid prizes out of his personal
finances[HPrize], each time a contestant reduces the size of an
algorithm that outputs Wikipedia. That is to say, if a contestant’s
algorithm could better predict the next character in Wikipedia, that
algorithm could create another, smaller algorithm that generated
Wikipedia. Back in 2006, few foresaw that such “next token prediction”
would become the basis of a new industry: The language model industry.
Hutter did.

In AIXI, Hutter's “top down” theory of AGI agents[AIXI], Solomonoff’s
theory of optimal inductive inference uses the AIC to select the best of
all possible models of what is the case (ie: scientific knowledge
about the world) while Sequential Decision Theory provides the ought
by applying this knowledge to decision-making. This is why he thought
James Bowery’s 2005 idea of a prize (he called “the C-Prize”[CPrize])
to compress Wikipedia, would be worth backing.

Because so few recognized the importance of next token prediction at
that time, there was no industry backing for this prize. As the industry
became caught up in the “unexpected” performance of language models
based on next token prediction, the enthusiasm overshadowed the deep
principles involved. Because Hutter was a professor, he was interested
in scientific research more than technological development, and so never
removed the severe limits on the allowed computational
resources[Resource].

Sara Hooker wrote a 2020 paper supportive of this kind of hardware
restriction titled “The Hardware Lottery”[HWLottery] in which she
decried the tendency for industrial bandwagons to forego scientific
research that could lead to superior hardware technology. Few recognized
the connection to the Hutter Prize’s resource restrictions. That’s why
it has never attracted much interest despite its great promise. Many
even consider the lack of progress in that benchmark to be evidence, not
of underfunding, but as validation of critics of the AIC!

Fortunately, leading lights in the language modeling industry such as
OpenAI founder, Illya Sutskyver, are increasingly explicit about the
validity of the AIC – usually stated as related terms such as
“compression” and, sometimes the more technically accurate, “Kolmogorov
Complexity”[KC].

However since the present purpose is social research rather than machine
learning research, defining computational resource limits is not
essential so long as the costs are borne by contestants.

Ethics In Establishing Judging Criteria for Prize Awards

There is a conflict of interest besetting the ethics of philanthropic
prize competitions: The more subjective the criteria the more power the
judges have. Even if the judges are not the ones who establish the
criteria, they may be part of a social milieu that does. In this
respect, there is a perverse incentive for philanthropists to prefer
subjective criteria so that they may award those in their social milieu
with power as judges. Indeed, this conflicting interest is virtually
inescapable except by one means: Make the award a singular metric that
anyone can verify.

Finding an objective metric aligned the intent of the prize
competition is challenging even in the absence of conflicts of interest.
Compound this with the increasing centralization of wealth and power,
insulated from accountability, and the aversion to giving up power to
objective prize criteria can be overwhelming.

For legal reasons, it is always necessary to include in any statement of
prize rules boilerplate such as “The judges decision is final.” However,
there is a critical difference when such boilerplate is linked to an
objective award criterion: Unfairness in the judging process is
transparent if it occurs, and thereby surrenders the judges’ social
status otherwise granted them by centralized power. If the judging
criteria are subjective, the judges always have plausible deniability
and may thereby hold onto the status that is motivating their social
milieu.

In practice, operating a prize of this kind (Hutter’s prize) has proven
to be relatively simple because the judges have so little discretion.
The judging process is largely a matter of simply executing the models
and comparing files bit for bit. There is little room for disputes and
what ambiguities have arisen have been readily resolved through a few
informal communications.

In the FWM competition judging will involve greater scrutiny of the
literacy of the open source explanations. Explainable data models are an
essential requirement for sociological discourse and what machine
learning is used to complement human intelligence in model creation must
produce similar “explanations”[Explain].

The Surprising Power of Foundation Models in Specialized Tasks

The concept of Foundation Models (FMs), as demonstrated by the emergence
and success of large language models, reshapes our understanding of the
relationship between wholistic and specialized models. The proposed
Foundation World Model (FWM) represents an evolution in this thinking,
shifting from traditional, specialized models toward a more integrated
and holistic approach. This shift is underpinned by the application of
the Algorithmic Information Criterion (AIC), which provides a robust
framework for model selection based on the principles of Algorithmic
Information Theory.

The core strength of FMs, such as the FWM, lies in their capacity to
integrate and analyze vast arrays of variables and data types. Unlike
traditional models that focus on predicting specific dependent
variables, FMs absorb and a broader spectrum of perspectives, leading to
more nuanced and potentially more accurate predictions. This is
particularly crucial in fields like sociology, where the interactions
between variables are complex and multidimensional. For example, in the
context of sociology, an FM can discern subtle socio-economic patterns
that might elude narrower models designed only to predict specific
outcomes like economic mobility or demographic shifts.

Moreover, FMs address some of the critical shortcomings associated with
traditional specialized models. The prevalent issues of overfitting and
p-hacking—where models are inadvertently or intentionally tuned to
produce desired outcomes rather than true predictions—are mitigated by
the holistic approach of FMs. By leveraging a principled loss function
like the AIC, FMs prioritize the simplicity and truthfulness of the
model rather than fitting to particular datasets, thus promoting more
honest and reproducible research outcomes.

In practical applications, this means that an FM like the FWM could
revolutionize how we predict and mitigate risks associated with complex
systems, including the centralization of power. By providing a more
comprehensive model that accounts for a wide range of influencing
factors, policymakers and researchers can obtain a clearer understanding
of potential futures and devise more effective strategies to avert
global catastrophes.

Philanthropic Risk Adjusted ROI

Intimately related to the aformentioned difficulty of aligning objective
prize award criteria with the intent of a prize competition, is the risk
that money paid out will not return commensurate value toward the
philanthropic intent.

Fortunately, because of the strong theoretic guarantees of the AIC, the
risk is minimal. Money paid out for improvements in reducing the size of
the algorithmic description of the ADUC extraordinarily high return per
philanthropic investment:

Predictive accuracy of the model.

Methodology

Competitions

There will be a series of 3 competitions, each increasing ADUC and prize
money by an order of magnitude. In all cases, entrants bear the
computational costs of judging. Input on the choices of ADUC from
Future of Life Institute will be desirable.

  1. Trailblaze: ($20,000) 100MB zipped ADUC by fiat drawn from US
    datasets curated by the US government. The purpose is to reduce
    controversy over the data selection so as to demonstrate the
    mechanics of the award process. Pay out only to the smallest
    executable archive of ADUC. Time period of 6 months.
  2. Controvert: ($200,000) 1GB zipped ADUC curated by conflicting
    schools of thought. The purpose is to show how adversarial world
    views can be brought into a formal scientific dialogue by including
    each other’s data in ADUC. Outcome: Advocates must algorithmically
    define their criticisms of adversarial data and thereby “clean” and
    “unbias” the data brought by their adversaries in order to
    better-compress ADUC. Pay out for incremental improvements. Time
    period of 1 year.
  3. Conciliate: ($2,000,000) 10GB zipped ADUC curated by increasingly
    heterodox parties clamoring to “be part of the conversation” in
    creating the Foundation World Model. Payout for incremental
    improvements. Time period of 2 years.

At the completion of the Conciliate competition, to the extent there
are serious philanthropists, autocatalytic reform of sociology will
commence rendering further support from Future of Life Institute
unnecessary.

Judging Process

Stages of the judging process for each entry of each competition:

  1. Submission of an executable archive that exceeds the previous AIC
    benchmark by the improvement threshold of that competition, and
    which expands into the ADUC.
  2. Submission of an open source FWM program that takes the ADUC as
    input and generates the previously submitted executable archive.
  3. Evaluation of the open source program by the judges to ensure that
    it explains the Foundation World Model used to compress the ADUC,
    including any improvements over prior FWMs from which it derives.
  4. Award of the prize simultaneous with the publication of the open
    source FWM.

All computational costs are borne by the contestant.

Power Concentration and Foundation World Model Analysis

“The Rich Get Richer” rephrased for power concentration is “The powerful
use their power to get more power.” This is a feedback model
illustrating why dynamical models are necessary. But it does not explain
how or why the powerful use their power to get more power.

Algorithmic generation of time series data necessarily exposes causal
structures latent in the data. Some of these latent structures are just
simple atomic items of missing data, the value of which is imputed from
the rest of the data. Some structures may be entire identities latent in
the data. For instance in forensic analysis, it may be that
investigators may impute the existence of a person of unknown identity
who is committing a series of crimes. However sometimes entire
dimensions of data are missing. For instance, a frequent item of
interest in sociology is “oppression”. But what is “oppression”? Can
you weigh it? Measure it with a ruler? Ask the Delphi Oracle to tell you
when where and how much of it appears?

Likewise the dimension of “power” is a variable missing in the raw
data as is its “concentration”.

We quite reasonably assume that to the extent we mean anything by
these words, they cause things that* matter* to us and that these
things can be measured. Indeed, “power” is nothing if not the cause of
many things that matter to us without itself being much influenced.
Power, therefore, is the most essential subject of causal analysis.

Power is what dynamical models expose and, indeed, define since
dynamical models necessarily expose causal structure.

Mitigating Power Concentration

How we go about mitigating power concentration depends on what few
things outside of power itself concentrate or dissipate power. These are
likely to be subtle influences, otherwise they would have been exploited
by power to concentrate power. We can be relatively assured that much
power has been invested in discovering these exploits, whether to
self-serve by concentrating power or to dissipate competing power. We
can also be relatively assured that such social engineering has been
based on closed foundation world models. An example of a closed
foundation world model might be just a culture of power seeking embodied
in the transfer of folk knowledge. It may also be a kind of “closed
source” sociology in which a foundation world model is treated as
intellectual property or state secret.

In both cases, the open source Foundation World Model will tend to
mitigate power concentration by its mere existence. For this reason, it
is reasonable to expect powerful interests will be, shall we say,
“interested”.

The inevitable conflicts of interests that arise during the pursuit of
the Foundation World Model can best be addressed by the 3 stages of
competition that expose these conflicts of interest to the light of day
and the most optimal, principled, objective and ruthless model selection
criterion available to us:

The AIC.

Bibliography

World3: Meadows, Donella; Randers, Jorgen; Meadows, Dennis. , A Synopsis: Limits to Growth: The 30-Year Update,

AIC: Solomonoff, R., A Formal Theory of Inductive Inference, 1964

Akaike: Akaike, H, A new look at the statistical model identification, 1974

Found: Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans, Foundation Models for Decision Making: Problems, Methods, and Opportunities, 2023

LLMWM: Illya Sutskever, https://youtu.be/NT9sP4mAWEg , 2023

Depend: Dependent and independent variables - Wikipedia

phack: Wasserstein, Ronald L.; Lazar, Nicole A., The ASA Statement on p-Values: Context, Process, and Purpose, 2016

MultiAn: Open Science Collaboration, Estimating the reproducibility of psychological science, 2015

Math: Wigner, E. P., The unreasonable effectiveness of mathematics in the natural sciences, 1960

Godel: Ernest Nagel and James R. Newman , Gödel’s Proof , 1959

HPrize: Marcus Hutter, The Hutter Prize for Lossless Compression of Human Knowledge, 2006

AIXI: Marcus Hutter, An Introduction to Universal Artificial Intelligence, 2024

CPrize: James Bowery, The C-Prize – A prize that solves the artificial intelligence problem, 2005

Resource: Marcus Hutter, Why do you restrict to a single CPU core and exclude GPUs?,

HWLottery: Sara Hooker, The Hardware Lottery, 2020

KC: Illya Sutskever, An Observation on Generalization, 2023

Explain: Phillips, P. Jonathon; Hahn, Carina A.; Fontana, Peter C.; Yates, Amy N.; Greene, Kristen; Broniatowski, David A.; Przybocki, Mark A., Four Principles of Explainable Artificial Intelligence, 2021

2 Likes

To check if I understand the crux of your proposal: you want to score the “world model” in an objective way, using the algorithmic information criterion as a basis. The second tenet is that these models should be “open source” - and because they are such, they’re available to everyone. But because they’re based on data, they’re tested and evaluated, therefore reliable. Finally, you propose on how to allocate the resources to those who will do the work.

This is great - this is objectivity.

The question I have is where is the data that one would be scoring? How to prevent overemphasis in how data is selected on pet topics of academics? How to prevent weird data transformations to obscure causal factors (examine the variable B in the Boston Housing dataset for example)?

4 Likes

That’s the point of the Controvert stage where:

ADUC (All Data Under Consideration) curated by conflicting
schools of thought. The purpose is to show how adversarial world views can be brought into a formal scientific dialogue by including each other’s data in ADUC. Outcome: Advocates must algorithmically define their criticisms of adversarial data and thereby “clean” and “unbias” the data brought by their adversaries in order to better-compress ADUC.

The blizzard of d/misinformation about AIC renders it difficult to deal with more than one problem at a time, so I was unable to talk as deeply as I might have liked about the subjective nature of “data selection” and how the contest addresses it.

I did bring this up in the form of a “conjecture” to the Algorithmic Information Theory mailing list:

[AIT] Conjecture: Bias Meta-Measurement Accuracy Increases With Increasing Diversity of Measurement Instruments

The motivation for the title’s conjecture arises with the increasing public concern, and confusion, over the definition of “bias” in large language models.

I’ve looked but have been unable to discover any work in the field of “algorithmic bias” that applies Algorithmic Information Theory to the identification of “bias”, in the scientific sense*, let alone its meta-measurement, given a bit string of passive measurements.

How would one go about doing a literature search for prior scholarship on this conjecture? How would one phrase the conjecture in the language of AIT?

*The majority of the concern over “algorithmic bias” in large language models refers to unrestricted curation of text corpora resulting in those models reflecting not only a “biased” utility function in, say, an AIXI agent’s use of Sequential Decision Theory, but also, and even more critically, “bias” in terms of the accuracy of the resulting algorithmic model of reality, yielding inaccurate predictions. Leaving behind the utility function notion of “bias” (SDT) and focusing on the scientific notion of “bias” (AIT) one can easily recognize how the scientific community detects bias in its measurement instruments with highly redundant cross-checks, not just between measurement instruments of the same phenomena, but also cross-discipline checks for consistency via unified theories. An extreme but simple example would be a world in which all thermometers were manufactured by the same company that, for some perverse reason, reports 101C at sea level for the boiling point of water but for all other temperature measurements reported the normal Celsius temperature. Cross-disciplinary checks with other kinds of measurements would result in a minimum algorithmic description that reified the identity of the thermometer company latent in the data as having a particular measurement bias – and quantify that bias to be 1C(water) so that thermometer measurements in general could be optimally predicted.

I’ve addressed this before here at scanalyst and elsewhere multiple times. At least Sisyphus tried to cheat death!

I really feel like the people in power don’t WANT science anymore – which drives me nuts when the Future of Life Institute tries to deal with “power concentration” and can’t be admonished to take advantage of a great banquet of science set before them with the AIC to do precisely what they want!

$4 million in grant money is far from “concentrated power” of course, so I do try to keep my id in check when the rock rolls down the mountain.

2 Likes

Tegmark should just rename it “The Future of Death Institute” and get it overwith:

Me: what is tegmark’s proposal to deal with unemployment in Life 3.0

ChatGPT: In “Life 3.0: Being Human in the Age of Artificial Intelligence,” Max Tegmark discusses various societal challenges that could arise from advancements in AI, including the potential for increased unemployment as machines become capable of performing tasks previously done by humans. Tegmark suggests several strategies to address this issue:

Education and Retraining: Tegmark emphasizes the importance of continuous education and retraining throughout people’s lives. This approach would help the workforce adapt to new jobs and roles as older ones become obsolete due to AI and automation.

Economic Measures: He proposes economic measures such as a more progressive tax system and the introduction of a universal basic income (UBI) to ensure that wealth generated by AI technologies benefits society at large, including those displaced from their jobs.

Regulatory and Political Action: Tegmark also discusses the need for proactive regulatory and political action to manage the transition and ensure that the benefits of AI are distributed fairly across society.

These proposals aim to mitigate the negative impacts of AI on employment and ensure that the advances in technology lead to positive outcomes for all members of society.

Me: what form of progressive tax?
ChatGPT: In “Life 3.0,” Max Tegmark discusses the concept of a progressive tax system as part of the solution to address income inequality exacerbated by AI advancements. He suggests that this could involve taxing corporations and wealthier individuals at higher rates, particularly those who benefit most from automation and AI technologies.

Tegmark also touches on the idea of taxing capital more heavily, which could include profits generated by robots and AI systems. This approach aims to redistribute wealth more equitably across society, ensuring that the benefits of AI are shared by a wider group rather than concentrated in the hands of a few. This type of progressive taxation could support social programs like universal basic income (UBI), providing a safety net for those whose jobs are displaced by technology.

Me: when you say “capital” do you mean capital gains?
ChatGPT: Yes, when referring to taxing capital more heavily, it includes capital gains, which are the profits earned from the sale of assets like stocks, bonds, or property. In the context of Max Tegmark’s discussion in “Life 3.0,” the idea extends to possibly taxing gains derived from investments in AI and robotics more heavily. This could mean higher tax rates on capital gains specifically from automation-enhanced businesses or profits generated directly by AI systems. This approach aims to redistribute the economic benefits of AI more broadly across society.

Me: Did he do an analysis comparing the taxation of capital as opposed to the changes in capital? A function and its derivative are quite different quantities.
ChatGPT: In “Life 3.0,” Max Tegmark doesn’t provide a detailed economic analysis comparing the direct taxation of capital versus changes in capital (such as capital gains). His discussion is more conceptual, focusing on broad strategies to address the economic and societal impacts of AI. He suggests progressive taxation and other measures as potential solutions to redistribute wealth and mitigate inequality, but he does not delve into the specifics of economic modeling or differential impacts of various forms of taxation.

For a more detailed economic analysis of these concepts, one would typically look to studies in economic journals or reports by economists who specialize in tax policy and automation. Tegmark’s book aims more to raise awareness and stimulate discussion about these issues among a general audience.

Me: Which studies have examined the different macroeconomic impacts of taxing capital vs taxing the time derivative of capital?
ChatGPT:
Searched 3 sites

There isn’t a substantial body of research directly comparing the macroeconomic impacts of taxing capital versus taxing capital gains. Most studies focus broadly on the effects of capital gains taxation in isolation and discuss its implications for economic growth, investment, and income inequality.

One study found that increasing taxes on capital gains and dividends can potentially increase government revenue and have a positive effect on GDP in the long term. It also noted that such increases could lead to a redistribution of income, benefiting lower- and middle-income households while reducing income for the wealthiest​ (IMPA)​.

Another source suggests that changes in capital gains tax rates can have various economic impacts, including on investment behaviors and economic growth, but it does not provide a direct comparison with taxing capital itself​ (FAS Project on Government Secrecy)​.

Overall, the research tends to examine these taxes in different contexts or separately, rather than providing a direct analytical comparison of the effects of taxing capital versus its time derivative (capital gains). This indicates a gap in the literature where more targeted studies could be beneficial for understanding these distinct economic variables and their impacts.

…

32 years ago, a more advanced analysis of this was available and associated with the privatization of space launch services yet no one gives a rat’s ass.

https://ota.polyonymo.us/others-papers/NetAssetTax_Bowery.txt

That this analysis has now been superseded is also something about which no one gives a rats ass.

That this can address TFR collapse is, of course, entirely beyond Elon Musk let alone Max Tegmark:

Gene Silencing Of The Lambs

4 Likes