Francois Chollet - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution

jabowery · 11 June 2024 21:02

You won’t find a more principled AIQ benchmark than lossless compression of Wikipedia that’s been out there for 18 years now. The resource limitations of the Hutter Prize for Lossless Compression of Human Knowledge are there to avoid research-stifling “The Hardware Lottery” discussed by Sara Hooker in that paper. Even if one disagrees with those resource limits, there is the Large Text Compression Benchmark that has been out there based on the same Wikipedia snapshot – and it has no resource limits. It, too, has been out there for the last 18 years. The lack of funding is what has held this back over that period of time.

It is a mistake to think that approximating Kolmogorov Complexity of a corpus like Wikipedia won’t advance reasoning in the abstract about things like algebra or more concretely about physics and other real-world predictions. Decompression is a process of inference. Conditional decompression is a process of responding to a condition aka “hypothetical situation” or “problem statement” which, in the case of a purely scientific model, is predicting what conditions are likely to follow on a hypothesis (what result will obtain from a given experimental condition).

Enon · 13 June 2024 21:51

Chollet i’s quite explicit that the idea is not to have a program solve the problems, but to have the program write a program for each example solution (or set of examples) which will be able to solve all problems of that type, which seems like a very good idea in general. He doesn’t seem to have tried Rasch measures or item-response theory, so he doesn’t yet have a real (ratio, or at least equal-interval) measure of intelligence, nor, equivalently a measure of difficulty of the questions, but that should not be too difficult.

It’s remarkable to me how intentionally crippled the current AIs still are - mostly no web search, ability to read large documents or websites, not even the ability to use any kind of software, and it’s the last that is the easiest and most useful. Current code generators don’t even check to see if their produced code even throws errors, let alone works, because they can’t. But not too long from now, they’ll have all github at their disposal, their generated programs will themselves be code generators, all levels of code will be full of introspection and debugging aids, and the AI can wrap things in a “do what I mean” graphical / VR / voice interface specially tailored to any task, making textual code something that rarely needs to to be worked on directly.

Papers I was reading today:
Relevant to compressing physical / spatial intuition into transformer kernels:

Geometric Algebra Transformer
by Johann Brehmer, Pim de Haan, Sönke Behrends & Taco Cohen. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).

Abstract
Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a Transformer, GATr is versatile, efficient, and scalable. We demonstrate GATr in problems from n-body modeling to wall-shear-stress estimation on large arterial meshes to robotic motion planning. GATr consistently outperforms both non-geometric and equivariant baselines in terms of error, data efficiency, and scalability.Abstract Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a Transformer, GATr is versatile, efficient, and scalable. We demonstrate GATr in problems from n-body modeling to wall-shear-stress estimation on large arterial meshes to robotic motion planning. GATr consistently outperforms both non-geometric and equivariant baselines in terms of error, data efficiency, and scalability.

The algebra isn’t hard, it’s basically the same 3D (+1) that computer graphics has used since forever, with the extra dimension being a null-square “bookeeping” dimension that makes projective calculations easier, particularly projecting from a 3D scene down to a 2D computer screen. The geometric algebra part of this also allows not just point vectors, but also lines, planes and their intersections to be represented in a way that is invariant under rotation or translation. Tensor cores on modern GPUs are made especially for 4x4 matrix math such as this, with even some $300 GPUS having over 100 TFLOPs (16b float) in tensor core performance alone, four or five times that of the main GPU math pipeline.

The Geometric Algebra Transformer networks handle not only these multivector geometric entities, but also arbitrary data linked with geometric content. This lets all the traditional transformer architecture algorithms work unchanged, and also allows their internal architecture to be applied to data or internal representations with geometric content. These should have big applications in VR, CAD, CAM (especially machining), physical simulations generally, robotics and AI embodiment.

Here’s an early application:
*CGAPoseNet+GCAN: A Geometric Clifford Algebra Network for*
Geometry-aware Camera Pose Regression
Alberto Pepe, Joan Lasenby, Sven Buchholz

We introduce CGAPoseNet+GCAN, which enhances
CGAPoseNet, an architecture for camera pose regression,
with a Geometric Clifford Algebra Network (GCAN). With
the addition of the GCAN we obtain a geometry-aware
pipeline for camera pose regression from RGB images only.
CGAPoseNet employs Clifford Geometric Algebra to unify
quaternions and translation vectors into a single mathematical
object, the motor, which can be used to uniquely
describe camera poses. CGAPoseNet can obtain comparable
results to other approaches without the need of expensive
tuning of the loss function or additional information
about the scene, such as 3D point clouds, which might
not always be available. CGAPoseNet, however, like several
approaches in the literature, only learns to predict
motor coefficients, and it is unaware of the mathematical
space in which predictions sit in and of their geometrical
meaning. By leveraging recent advances in Geometric
Deep Learning, we modify CGAPoseNet with a GCAN:
proposals of possible motor coefficients associated with
a camera frame are obtained from the InceptionV3 backbone,
and the GCAN downsamples them to a single motor
through a sequence of layers that work in G4,0. The network
is hence geometry-aware, has multivector-valued inputs,
weights and biases and preserves the grade of the objects
that it receives in input. CGAPoseNet+GCAN has almost
4 million fewer trainable parameters, it reduces the average
rotation error by 41% and the average translation error
by 8.8% compared to CGAPoseNet. Similarly, it reduces
rotation and translation errors by 32.6% and 19.9%, respectively,
compared to the best performing PoseNet strategy.
CGAPoseNet+GCAN reaches the state-of-the-art results
on 13 commonly employed datasets. To the best of our
knowledge, it is the first experiment in GCANs applied to
the problem of camera pose regression.

Nice to see Joan’s still keeping ahead of the GA field, which she and her husband pretty much started at Cambridge in the '90s.

Some of the more interesting applications papers for geometric neural nets are on molecular / chemical modeling, but they’re rather difficult. Expect big things in DNA and protein engineering, small-molecule synthesis and even molecular nanotech from these sorts of AIs.