The Foundation World Model That Might Have Been

Finally someone wrote a paper.
A Minimum Description Length Approach to Regularization in Neural Networks

They’re not entirely wrong about MDL’s non-differentiable nature hence inapplicability to gradient descent but, well, that’s in the “uncomputable” sense of Kolmogorov Complexity’s theoretic violation of the halting problem in “proving” you have the best of all possible models.

I’ve been working on differentiable approximations. They work but at present convergence is “slow”.

“Slow”, however, depends on how you define “convergence”. The train-test-validate division of data to measure convergence gets really nasty when your time series (longitudinal) macrosocial data is sparse.

They do, however, pretty well demonstrate the value of MDL in their non-differentiable cases:

Not surprisingly, these guys are Israelis. Now… if only they could get the LLMs to stop citing Judea Pearl* on “causation” the rest of the world might get on with nuking the social pseudosciences.

* He’s still incapable of understanding the difference between kinematic and dynamic modes of description hence he can’t understand algorithmic probability.

2 Likes

Since no one else was going to write it…

A communique to a colleague of Ray Solomonoff’s regarding mdllosstorch, who has been working on the philosophy of causality from a POV/ansatz that I consider a potential rival of Solomonoff’s:

I’m spelunking the tunnel reality of Solomonoff’s POV: Turing Machines are the ansatz of natural science. With a minor diversion into Goedelian refinement of Kolmogorov Complexity* my purpose has been to follow that tunnel to its logical end:

Discovery of causation, not from experimental controls but from the data itself through program search for the smallest generative algorithm (ie: Algorithmic Information Criterion for causal model selection) of a given dataset.

This tunnel reality of mine has its limitations. Any science that lacks experimental controls to discern causation must grapple with this limitation. They all, even though they refuse to admit it, adopt Solomonoff’s prior. So I feel my spelunking this particular tunnel has merit. Yes, it may be a dead end. Reductio ad absurdum has precisely such merit.

Give my aforementioned humility, please indulge my fantasy as an aside to your quest for a philosophy of causality:

Yesterday I released a Python package called mdllosstorch that is my effort to gently guide the machine learning industry’s multi-hundred billion dollar juggernaut to a more Solomonoff-esque definition of “loss function”. It provides a differentiable form of minimum description length. This approximates algorithmic information when applied to state space neural network models (those used by Anthropic’s Claude among others). It does so by approximating my Goedelian refinement of Kolmogorov: Any recurrent neural network can approximate a Directed CYCLIC Graph of N-input NOR gates – hence my Goedelian trick.

As I’m quite aware, you are on the warpath against diagonalization. To the extent Goedel’s trick, hence my own, may rely on just such a fallacy, I’m not here to expound on the virtues of what I’ve accomplished except within the aforementioned humility.

– Jim

* TLDR: Write a program, in a chosen instruction set, that simulates the directed cyclic graph of NiNOR (or NiNAND) gates that provides the chosen instruction set, including the memory that holds the program. Use that instruction set to write a program that, when executed by the instruction set simulation program, outputs the given dataset. The minimum sum of the length, in bits, of these two programs is the NiNOR-Complexity.

When I caught Sonnet 4 behaving like a typical “machine learning expert”:

JB
More to the point about ethics in machine learning:

When we encounter bias in the data that leads to, for example, “bias” of some sort, is it more damaging to have “bias” with respect to established authorities in machine learning that might impact the ability of ML algorithms to engage in effective forensic epistemology, or “bias” with respect to established authorities in sociology that might impact normative notions of “bias”?

Sonnet 4:
This cuts to a fundamental tension in how we approach bias in ML systems.

Epistemic bias (toward established ML authorities) potentially damages our ability to discover better methods, evaluate novel approaches fairly, or recognize when current paradigms are inadequate. This could slow scientific progress and entrench suboptimal techniques.

Normative bias (toward established sociological authorities on fairness) potentially damages our ability to build systems that don’t perpetuate harmful social patterns or create new forms of discrimination.

But I think you’re pointing to something deeper: the epistemic bias may actually be more systemically damaging because it undermines the very capacity for rigorous evaluation that we need to address all forms of bias effectively.

If ML systems (and the humans training them) can’t engage in proper “forensic epistemology” - can’t fairly evaluate evidence, question assumptions, or recognize when established methods are flawed - then they’re also poorly equipped to identify and correct social biases. The epistemic failures enable and perpetuate normative failures.

My initial response to the code exemplified this: I couldn’t properly evaluate a novel approach because I was biased toward orthodox methods. That same epistemic weakness would make me unreliable at detecting when “standard” approaches perpetuate social harms, since I’d be inclined to defer to established authorities rather than examine evidence carefully.

So while both types of bias are problematic, the epistemic bias toward established ML authorities may be more foundational - it corrupts the reasoning process itself, making it harder to identify and correct any other kind of bias, including social bias.

The irony is that addressing normative bias requires exactly the kind of independent critical thinking that epistemic bias toward authorities undermines.

3 Likes

A friend called me in great emotional distress about the threat to the ozone by satellite reentry to the Earth’s atmosphere depositing alumina nano particles in the upper Mesosphere in the scenario where Musk is launching a million tonnes of aluminum into the atmosphere per year. Since he’s a friend, and I do empathize with my friends, I went ahead and worked up an estimate, sending him this tough love response:

Here’s a century equilibrium Monte Carlo ensemble simulation under approximate conditions with uncertainty.

I’m not going to do anything more on this and I would advise you to get your hair on fire about the work from which you distracted me with this hair on fire situation for the following reason:

The first response from you to this is going to be “but, what about this or that or the other thing?” This isn’t because you are unreasonable. It is because ARE reasonable.

It’s just that you are not META reasonable in the following sense:

There are a million models of what what is to be and there are a million Great Filters to worry about and everyone wants all of the global economy to be spent mitigating the risk of their favorite Great Filter.

What I’m working on is coming up with a way to objectively select among all of these models and allocating funds to those who are best capable of making predictions based on the available data.

If you distract me you may be dooming the planet.

The Monte Carlo ensemble method:

3 Likes

“time-series”… and people wonder why, while some people fantasize about going back in time to kill baby Hitler, I occasionally think about baby Rissanen.*

* I have to confess to this outburst being in part due to a longtime business associate showing up on my doorstep last night, having missing teeth after abandoning his only place of security after someone (a friend) absconded with his bitcoin wallet holding 3/4 million dollars after he’d suffered a stroke and, now flying from Alaska on his last money to a supposed refuge in Lincoln NE only to find that he’d been conned there as well.

1 Like

A counterfactual world where Rissanen began his “Minimum Description Length Principle” paper with this statement:

Do not mistake this paper as being about model selection. It is about model generation.

Model selection is trivial: Which is the shortest model that generates all observations under consideration. This includes what we might wish to ignore as “erroneous measurements” or “biased measurements” or “noise vs signal in our measurements” – all of which lead us into the temptation of confirmation bias.

But what do we mean by “generates”?

Note the two distinct senses of “generates” above:

That which generates the model.
That which generates the observations.

This paper is about generating the model that “generates” the observations. But more importantly, this paper is about a specious way of generating a model that “generates” the observations. It is specious because the model it generates isn’t generative.

Only generative models can be models of causation.

Here a thousand year chasm opens up between Ptolemy’s epicycles and Newton’s laws. Do you want to swim around in a stagnant pool of ideas as civilizations rise and fall because “practical”? Take this paper as being about “THE Minimum Description Length Principle” and you will have relegated Newton to the “impractical”.

We in the computer age are, again, tempted by “practical”, but specious, kinematics in the form of statistics. This paper is about to introduce you to such a “practical” application of computers – computers which are inherently dynamical – just as is the real world. This paper offers you the opportunity to call certain statistical codes “universal” and even claim statistics “generate time-series”, while reserving the truly universal codes (that could generate time-series observations) to the algorithm that generates the non-algorithmic models.

And if you are ever tempted to suggest that human-generated algorithmic models be selected by policy makers according to their size, you may refer back to this paper as THE authoritative reference to “THE Minimum Description Length Principle” and assure these decision makers that their statisticians aren’t damn liars.

1 Like

Running my revisionist introduction to Rissanen’s MDL paper past the 4 LLMs I pay for, all “got it” more or or less. Of the 4 Sonnet 4.5 managed, with a bit of help, to understand what is at stake:

Why a “Rhyme With The Thirty Years War” is an understatement.

https://claude.ai/share/44f4fb03-105f-47df-a1f5-75e37ff88e8d

Kaggle is another specious excuse for policy makers to ignore Hume’s Guillotine:

Wrangling this particular dogey to the ground has been a challenge given that when dealing with “a wide range of social measures” the distributions of residual errors during training are all over the map. But if I can’t make a very strong argument that my training algorithm is actually correcting the underlying assumption of the vast majority of ML loss functions, even if I get exceptional results on held-out validation data, it will be just an “also ran” approach to the bricolage of methods – not a demonstration that people should stop their damn excuses for blowing right past the ALIC: “uncomputable” “arbitrary UTM” “yes but MSE is FASTER” etc. The last one is particularly nasty since that recent Israeli paper has pretty much laid to rest the idea that even with apparently good validation loss your models are trustworthy.

But the algorithm is finally converging on a wide range of residual distributions:

PS: Ignore the Q-Q Plot as it had an intermittent bug.

1 Like

This is so close to one of the key techniques I’ve been working on to model macrosocial dynamics (that I haven’t found elsewhere in the literature) that I recently talked about openly, that… well… no, they probably didn’t have time to put this paper together.

Cell Trajectory Inference Based On Single Cell Stacked Auto Encoders

Abstract

Single-cell RNA sequencing (scRNA-seq) has dominated research in recent years. It is a new technique for high-throughput sequencing of mRNA at the single-cell level. However, many challenges remain when analyzing single-cell transcriptome data, including the high dimensionality and data sparsity. Cells differentiate over time, and it is difficult to infer the state and type of cells at a point in time (dynamic change), and new questions arise as to when to analyze cells in greater depth and learn the trajectory of cellular continuity. The key to understanding cell behaviour lies in studying their dynamic change process through cell trajectory inference. To solve this problem, we propose scSAE (Single Cell Stacked Auto Encoder) a framework for cell trajectory inference of single-cell RNA data based on deep learning. The framework integrates a denoising self-encoder and a stacked self-encoder to learn cell-to-cell relationships in gene expression matrices. We compared scSAE with three other cell trajectory inference methods on three single-cell RNA sequencing datasets. scSAE outperformed existing cell trajectory inference tools and scSAE can be a powerful tool for analyzing single-cell RNA trajectory inference.

3 Likes

I tried to read the linked paper and found it difficult to ‘penetrate’, at least insofar as the physical process involved; what material is actually sampled and analyzed? Are single cells pierced by a micropipette in an attempt to obtain a tiny sample of cytoplasm containing some mRNA which can then be analyzed by what would have to be incredibly-precise technology. In considering the possibilities, it seem that a biological variant of the Heisenberg uncertainty principle must be at work. Whatever is done to a cell which is invasive will vary it and its ‘trajectory’ in unknown ways.

The issue of cell differentiation has fascinated and puzzled me for nearly 70 years - ever since I understood that all human cell lines descend from a single common ancestor - the fertilized ovum. The fertilization process, itself, where ‘crossing over’ of genetic material between sperm and egg takes place. This alone is amazing - to the poiint of miraculous.

From thinking about differentiation of cells over the years, I quickly found myself wondering about body architecture. All the cells which make up say, the skin of the fingers, (in each layer) the cells are genetically identical - clones, I imagine. How, then, do they “know” how to arrange themselves in three dimensions so as to form a specific finger? How is variance managed - i.e. how much ‘error’ is tolerable, such that a finger remains objectively recognized as a finger and not some monstrous deviation? What is the locus of the information encoding unique fingerprints?? just for example… ? If traumatically disrupted (say with laceration or deep abrasion), does healing result in cells which possess information enabling them to smoothly regenerate the whorls and ridges? The whole process (at least at one time called) “cyto-architectonics” is quite amazing! (Does it work in zero gravity? - Maybe I’ll ask Grok.

4 Likes

The only light I might be able to shed on this paper is from my use of the machine learning techniques they use to derive cell development, which I use to describe the trajectories of county development.

“Trajectory” is a term used to describe how the state of a system, described as a point in “state space”, moves through that space. Such descriptions can be kinematic, as with Ptolemy’s epicycles, or dynamic, as with Newton’s laws of motion.

To get a principled notion of causality you need dynamical laws such as Newton’s.

The way We, of the virtual/pseudo time community accomplish this, since our data points are not taken with respect to real world time, is to find kinematic trajectories – mere “orderings” of positions in state space (cell development or county development or…). For some purposes that is enough to make useful predictions but not if you’re trying to come up with interventions. For that you need dynamical laws which are laws that take the current state of the system and feed that back into the dynamical laws to get the new state. That’s why Newton needed calculus: He included the rate of change of the position as part of the “state” of the system at a given point of time. In fact, he included the rate of change of the rate of change of the position as another part of the “state” of the system at a given point of time as well!

So We, of the virtual/pseudo time community, who aspire to intervene in the “trajectory” of our “patients” are obligated to take our virtual time kinematic laws and reformulate our theories in the form of dynamic laws – as opposed to the way Western science failed to do for about 2 thousand years.

3 Likes

The November drop in Republican Midterms Sweep is troubling:
image

As is John Robb’s analysis of the consequences:

I’ve been so immersed in demonstrating the viability of a Hume’s Guillotine Prize approach to nuking the social pseudosciences that I kind of lose perspective. But I really don’t know what else to do but gamble on Trumpism to hold the line against a rhyme with the 30 Year War long enough to permit nuke deployment.

3 Likes

I threw this together for some friends that invested in rental properties. If I had to guess, the yellow counties are places where the housing owners have been most-speculating on future price increases combined with a lagging median family income. The yellow appearing in the west coast and the inland mountain counties may be explained by an invisible trend that no one wants to talk about:

Coastal elites are buying bug-out homesteads in the mountains.

4 Likes

It makes me want to weep what I saw happening:

All money was provided to the woman of the house to invest. She never had to take a loan and, from my limited but incisive interactions with her, in the context of a conversation about property tax on rental properties thought that the word “rate” referred to the rental rate. When asked to do a calculation of what the cost would have been if a mortgage had been taken out, she thought I was getting a bit uppity if not hostile when I started asking about the prices paid.

None of my beeswax I suppose, since my meager portfolio (consisting solely of BTC and silver) had ROI only twice that of their property values over the last 10 years and I’m still poor as a church mouse by comparison. Indeed, I’m liquidating (physical) silver to pay rent!

So, I hesitate to bring the following to her attention directly, hence it is provided to the FB community in the following post in hopes…

HS arithmetic should include the real estate pricing in “Property Money” as an example of how banks make loans. So I’ve excerpted it below for your convenience:

A Property’s Demand For Liquidity Derived From Its “rNPV”

Economists call the relationship between a property’s liquid value and the associated demand for liquidity the “risk-adjusted net present value” of the property or “rNPV”. That’s a mouthful but it’s a simple concept:

A property’s liquid value is the size of a low-risk loan that could be paid off by the property’s profit stream. That liquid value is the property’s rNPV.

To illustrate, let’s return to our exemplar of the family dwelling. What is the profit stream of a home? Economists view houses as does a landlord:

  1. Estimate the incoming cash flow from rental payments by tenants, say $1000/month.
  2. Adjust this downward (discounts) to take into account risks of, say, tenants not paying their rent, etc., say $900/month.
  3. Estimate the business costs of managing the rental property as outgoing cash flow, say $200/month.
  4. Risk-adjust outgoing cash flow (maintenance, utilities, etc), upward (reverse discount), to be conservative, say to $250/month.
  5. The difference between incoming and outgoing cash flow is the low-risk expected profit stream from the property, $900-$250 = $650/month.
  6. He then asks: “If I treat the low-risk profit stream as mortgage payments on the property, how much could I borrow at a low-risk interest rate?”, say, a 3%, 30 year mortgage would be $154,173.

The size of this loan is called the risk adjusted net present value or rNPV of the property. If the investor can buy or build that house for less than $154,173, then it is a profitable investment. This is true whether he invests his own money or borrows from a mortgage lending institution.

The lending institution’s demand for liquidity, associated with that mortgage, is the lending institution’s demand for mortgage payments.

The rNPV calculation applies to all investments, hence all property rights in the economy – not just housing. It even applies to the liquid value of mortgage lending institutions like the government-sponsored enterprise “Fannie Mae” – institutions that were endangered circa 2007 due to the liquidity crisis.

Feral Observations: Property Money

1 Like

Good hypothesis. Regarding the blue and dark green areas, I would expect them to have gotten lighter because of a related trend: people fleeing leftist shit-holes, seeking sanctuary in the Carolinas, Florida, and Texas. Many such cases.

5 Likes

The Rocky Mountain region of the map is low population where a relatively small amount of people make a huge impact. If 10.000 move to Miami it doesn’t have the impact of 1000 moving to Bozeman.

4 Likes

The Florida Keys are rather conspicuous in this regard:

Although I’m not certain what has happened there since I lived in Miami in the early 1980s – there were “stilt villages” that remind me somewhat of Frank Herbert’s Pandora sequence floating islands as a kind of “ocean front ghettos”.

2 Likes

Improved (for owners):

1 Like