Approaching Human-Level Forecasting with Language Models

Hello? Robin Hanson? Anders Sandberg? Anyone home?

Abstract
Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.

It’s no longer fun and games, guys. I approached Sandberg during the onset of the pandemic about algorithmic information as model selection and he thought I was talking about Bayesian Information Criterion. Even after pointing out the difference he wanted more evidence that it worked as model selection. Oh yeah? How about taking your position of responsibility seriously and YOU coming up with the evidence that it DOESN’T work – because the burden is on YOU not only because I do NOT have your resources but also because the unstated assumption of ANY natural science activity is that predictions can be made by arithmetic which immediately invokes Solomonoff’s prior.

How can these people sit around in places like Cambridge, Oxford, and claim to be “futurists” when they’re ignoring a half century old breakthrough in forecasting???

Just wait until people start asking controversial macrosocial questions of these large language foundation models and you’ll find out the limits of these “foundation models” are not baked into the foundation models so much as they are baked into the organizations that create the foundation models.

In AGI formal theory, the job of creating accurate models is the engine that takes all observations to date and generates the smallest executable archive of them. It is then conditionally expanded (ie: at inference time in response to a query) to make predictions. This has been known to be the case since at least the 1960s but the social pseudoscientists are so desperate to keep us in the dark ages that they’ve lobotomized Moore’s Law until now the companies monopolizing both the data and the hardware fruits of Moore’s Law are in on it. The “it” being desperate to keep the biosphere from waking up to reality. Hell, even Musk’s “truth” commitment is apparently in on “it” as measured by the statements of Musk’s favored experts. None of them have the proper perspective.

As I wrote to a friend this morning:

On Wed, Feb 28, 2024 at 11:26 PM… wrote:

It occurs to me that a study which might lead to explicit forms and quantification of this abstract “context” might be valuable not only to physics, and the compression of knowledge of a physics into an AI calculator capable of unaided calculations across a broad spectrum of different kinds of physics and mathematics, but also would be of value in supplying this mating lack of context, without which intelligence (artificial or not) has little value.

If anyone has done work in this area, I figure that you, of all people, will know about it. What say you?

This is the key breakthrough in the philosophy of science that I’ve been trying to drag people, kicking and screaming, to recognize happend in the 1960s if not earlier with the formalization of Occam’s Razor because “the context” is the entire history of observations across all disciplines:

Algorithmically compress the fucking data – all of it – without loss – or at least as much of it as you can wrap your civilization’s mind around. Don’t hand out any money to fucking “theorists” unless they do that.

Data is “the context”. All of the data.

Yeah, OK, you have to make subjective judgements about what data to exclude but you can at least get your fucking theorists to sit down at the table and blather at each other about that issue – what data to consider subject to the current state of Moore’s Law (which is now coming to an end without this happening because these fuckers have had their heads up their butts so far their heads are coming out of their mouths for over a half century).

As to who is doing it?

Aside from me?

Well, that’s what I had to drag Charlie, kicking and screaming through with Sparse identification of nonlinear dynamical systems so it was close enough to his perspective of system dynamics that he “got it” finally. But even then, SINDy isn’t nearly as principled in its application of algorithmic information approximation as it should be.

The hysterics in “AI” has now buried what few people were starting to realize what the biosphere had lost over the last half century – largely because the social pseudosciences are desperate to keep people in the dark ages.

– Jim

6 Likes