Chollet i’s quite explicit that the idea is not to have a program solve the problems, but to have the program write a program for each example solution (or set of examples) which will be able to solve all problems of that type, which seems like a very good idea in general. He doesn’t seem to have tried Rasch measures or item-response theory, so he doesn’t yet have a real (ratio, or at least equal-interval) measure of intelligence, nor, equivalently a measure of difficulty of the questions, but that should not be too difficult.

It’s remarkable to me how intentionally crippled the current AIs still are - mostly no web search, ability to read large documents or websites, not even the ability to use any kind of software, and it’s the last that is the easiest and most useful. Current code generators don’t even check to see if their produced code even throws errors, let alone works, because they can’t. But not too long from now, they’ll have all github at their disposal, their generated programs will themselves be code generators, all levels of code will be full of introspection and debugging aids, and the AI can wrap things in a “do what I mean” graphical / VR / voice interface specially tailored to any task, making textual code something that rarely needs to to be worked on directly.

Papers I was reading today:

Relevant to compressing physical / spatial intuition into transformer kernels:

**Geometric Algebra Transformer**

by Johann Brehmer, Pim de Haan, Sönke Behrends & Taco Cohen. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).

Abstract

Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a Transformer, GATr is versatile, efficient, and scalable. We demonstrate GATr in problems from n-body modeling to wall-shear-stress estimation on large arterial meshes to robotic motion planning. GATr consistently outperforms both non-geometric and equivariant baselines in terms of error, data efficiency, and scalability.Abstract Problems involving geometric data arise in physics, chemistry, robotics, computer vision, and many other fields. Such data can take numerous forms, for instance points, direction vectors, translations, or rotations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a general-purpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric (or Clifford) algebra, which offers an efficient 16-dimensional vector-space representation of common geometric objects as well as operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a Transformer, GATr is versatile, efficient, and scalable. We demonstrate GATr in problems from n-body modeling to wall-shear-stress estimation on large arterial meshes to robotic motion planning. GATr consistently outperforms both non-geometric and equivariant baselines in terms of error, data efficiency, and scalability.

The algebra isn’t hard, it’s basically the same 3D (+1) that computer graphics has used since forever, with the extra dimension being a null-square “bookeeping” dimension that makes projective calculations easier, particularly projecting from a 3D scene down to a 2D computer screen. The geometric algebra part of this also allows not just point vectors, but also lines, planes and their intersections to be represented in a way that is invariant under rotation or translation. Tensor cores on modern GPUs are made especially for 4x4 matrix math such as this, with even some $300 GPUS having over 100 TFLOPs (16b float) in tensor core performance alone, four or five times that of the main GPU math pipeline.

The Geometric Algebra Transformer networks handle not only these multivector geometric entities, but also arbitrary data linked with geometric content. This lets all the traditional transformer architecture algorithms work unchanged, and also allows their internal architecture to be applied to data or internal representations with geometric content. These should have big applications in VR, CAD, CAM (especially machining), physical simulations generally, robotics and AI embodiment.

Here’s an early application:

***CGAPoseNet+GCAN: A Geometric Clifford Algebra Network for***

**Geometry-aware Camera Pose Regression**

Alberto Pepe, Joan Lasenby, Sven Buchholz

We introduce CGAPoseNet+GCAN, which enhances

CGAPoseNet, an architecture for camera pose regression,

with a Geometric Clifford Algebra Network (GCAN). With

the addition of the GCAN we obtain a geometry-aware

pipeline for camera pose regression from RGB images only.

CGAPoseNet employs Clifford Geometric Algebra to unify

quaternions and translation vectors into a single mathematical

object, the motor, which can be used to uniquely

describe camera poses. CGAPoseNet can obtain comparable

results to other approaches without the need of expensive

tuning of the loss function or additional information

about the scene, such as 3D point clouds, which might

not always be available. CGAPoseNet, however, like several

approaches in the literature, only learns to predict

motor coefficients, and it is unaware of the mathematical

space in which predictions sit in and of their geometrical

meaning. By leveraging recent advances in Geometric

Deep Learning, we modify CGAPoseNet with a GCAN:

proposals of possible motor coefficients associated with

a camera frame are obtained from the InceptionV3 backbone,

and the GCAN downsamples them to a single motor

through a sequence of layers that work in G4,0. The network

is hence geometry-aware, has multivector-valued inputs,

weights and biases and preserves the grade of the objects

that it receives in input. CGAPoseNet+GCAN has almost

4 million fewer trainable parameters, it reduces the average

rotation error by 41% and the average translation error

by 8.8% compared to CGAPoseNet. Similarly, it reduces

rotation and translation errors by 32.6% and 19.9%, respectively,

compared to the best performing PoseNet strategy.

CGAPoseNet+GCAN reaches the state-of-the-art results

on 13 commonly employed datasets. To the best of our

knowledge, it is the first experiment in GCANs applied to

the problem of camera pose regression.

Nice to see Joan’s still keeping ahead of the GA field, which she and her husband pretty much started at Cambridge in the '90s.

Some of the more interesting applications papers for geometric neural nets are on molecular / chemical modeling, but they’re rather difficult. Expect big things in DNA and protein engineering, small-molecule synthesis and even molecular nanotech from these sorts of AIs.