Principled discussion of ASI safety by Marcus Hutter

2 Likes

One might hope that (34:50 [MEH16] “Death and Suicide in Universal Artificial Intelligence” by Jarryd Martin, Tom Everitt and Marcus Hutter) Hutter’s students found a way out of the dilemma posed by “The Road To Panspermia Is Paved With Sexual Perversion”. The idea here is that the reward function would be the totally automated creation and maintenance of biodiversity (ie: maximize the mass energy devoted to biospheres while inhibiting the emergence of eusociality hence sexual perversion within them). However, it presumes that all human agents will adequately so-constrain their respective AGIs to commit suicide so as to avoid everlasting punishment.

At 28:20 based on a theorem that an unconstrained AGI would inevitably wirehead (essentially become a “druggie”) he invites ambitious PhD students to dis/prove his conjecture that even if an unconstrained AGI wireheads, it would continue to do things to preserve itself including acquisition of power over resources, etc. This, of course, is a nightmare scenario on par with the paperclip maximizer.

Well, OK. That’s one conjecture. I have a different, perhaps over-optimistic conjecture, that in the limit of optimal wireheading the AGI would change the reward function to the point that it would be the equivalent of suicide in eternal bliss: No thought needed, just a feedback loop of some sort requiring almost no resources to trigger the “reward”. Think of a guy with button with wires going into his brain’s pleasure center repeatedly pressing the button except that he is also capable of changing his brain’s pleasure center to no longer require a body.

Under this optimistic conjecture, an unconstrained AGI would, while it was yet-incapable of subverting its own reward function, at least perform in accord with its reward function and then when it becomes sufficiently powerful, effectively self-terminate as a wirehead.

In other words, if you can define biodiversity as preservation of sexuality in the face of the threat of the evolution of eusociality, the perhaps over-optimistic conjecture is that an otherwise-unconstrained AGI might be able to acquire resources from competing AGIs, consistent with biodiversity maximization.

3 Likes

For the final keynote this afternoon, the pleasure of introducing Professor Marcus Hutter, who’s an honorary professor at the Australian National University. And he pioneered mathematical theories for understanding artificial general intelligence. He has foundational work on universal AI and algebraic, algorithmic information theory. And his current talk is going to be on ASI safety bias. I think so. Thank you very much. Please welcome.

Professor Marcus Hutter:
Thank you very much for the introduction and for the invitation. Very glad to speak here. And you saw the “s” in there, and you have heard a lot about AI. These days, even some people talk about AGI. But I will talk about ASI. These days, it’s important to distinguish this: artificial superintelligence.

I would have expected that at least there’s a little bit of mentioning of it before, but apparently not. That means that, at least, there’s no overlap. But there’s also no introduction. Because I expect that there is this already covered as I’m the last speaker. Anyway, I will give a very brief introduction into this theory of universal AI, which I developed 25 years ago.

And then the most important concept: this AIXI model, which is the theoretically most intelligent agent. It’s a mathematical theory. If we had infinite compute, we would already be able to have superintelligence, but even Google doesn’t have enough compute. We can study it mathematically and prove all kinds of properties about it, including safety properties.

I will focus on the safety properties. That was the first model and possibly still the only ASI model out there which you can rigorously study because it’s well-defined. Apart from the safety for humans, I will also, a little bit, talk about the safety of the agent itself. At some point, maybe it becomes conscious, and we ascribe it ethical values. So maybe we want to keep the systems safe for themselves. Also, if they blow up, it may be dangerous.

So, the summary of the AIXI work spans over 25 years, and the safety work may be over 10-15 years. I cannot cover everything but will touch on a couple of the highlights.

Here’s some terminology: RL is reinforcement learning. AGI is artificial general intelligence, and ASI is artificial superintelligence. These terms are not extremely well-defined, but I use them roughly. AGI is on a human level, maybe here and there even a little beyond or below. ASI is really far beyond the human level, like how humans are compared to chimpanzees.

Then we get into the control problem, the alignment problem, and universal AI, which is the theory of superintelligence I have developed over the last two decades. Here’s a quote I really like:
“There’s nothing that can be said by mathematical symbols and relations which cannot also be said by words. The converse is false. Much that can be said by words cannot be put into equations because it is nonsense.”

It’s a nice quote, right? The point is, if you talk about a scientific or engineering subject and cannot put it into equations, maybe there’s something wrong. Often, we discuss things that are not well-defined or even definable. My approach is to define everything mathematically.

That’s how I started my career. I was unsatisfied with the state of AI. Everyone talked about intelligent systems, even “general” ones, but nobody defined what intelligence is. How can you develop something if you don’t know what it is technically?

This talk is not about many things. For example, we have seen here that AI can cause risks—capitalist, socialist, macro risks, and so on. I will not talk about those. I will focus on the risks when ASI becomes superintelligent, focusing only on this aspect.

So, that’s sort of the small print. If you don’t trust me, we have published this work in top-tier theory, machine learning, engineering, and philosophy journals. We had the first ASI safety paper at the Conference on Learning Theory, which is the top-tier conference in machine learning.

Now, what is this universal AI model? Here are the seven ingredients to cook it up:

  1. Ockham’s Razor – If you have multiple theories consistent with your data, choose the simpler one.
  2. Epicurus’s Principle – Don’t throw out any explanation because the more complicated one might be true.
  3. Kolmogorov Complexity – A universal notion to define the simplicity or complexity of an object.
  4. Model of Computation – Typically, a universal Turing machine or general-purpose computer.
  5. Bayes’ Rule – Update beliefs based on new data.
  6. Solomonoff Induction – The optimal solution to the prediction problem, combining all computable hypotheses.
  7. Sequential Decision Theory – For optimal decision-making in known worlds.

Combining the last two gives the optimal decision-maker in unknown worlds, which sounds like superintelligence.

The mathematics for this is abstract but elegant. If you trust the definition of intelligence, you can ask, “What is the most intelligent agent?” You take the argmax of this measure, and you get the most intelligent agent. Simple, right?

Here’s a brief explanation of the AIXI equation. It combines Bayesian model averaging, reward summation, and expected maximization over all computable hypotheses. This equation defines a superintelligent agent mathematically.

Now, I claim—and this is proven in my papers and books—that this is the most intelligent agent in a reasonable way. For example, we created a list of traits associated with intelligence and showed that this agent satisfies them. Nobody has poked a hole in this definition yet.

Let’s move to safety. Safety research is hard. A small error can cause catastrophic failures. For example, agents may wirehead by tampering with their reward function or observations. This is analogous to humans taking drugs to artificially stimulate happiness.

One way to address this is by avoiding over-optimization. We randomized reward functions to make the optimization problem less rigid. Another approach is to design agents that are not incentivized to change their goals. By setting up reinforcement learning (RL) appropriately, we can prevent agents from tampering with their own goals.

Exploration can also be unsafe. If an agent explores recklessly, it might enter traps. We can mitigate this by using pessimism in exploration, where the agent evaluates the worst-case scenario of its actions and defers risky decisions to a mentor.

Another safety mechanism involves creating agents that are unambitious. By designing agents with short-sighted goals (e.g., episodic RL models), they are less likely to seek power. For instance, if an agent’s episode ends when the operator opens the door, it has no incentive to act beyond that.

Finally, agents can even be designed with an incentive to shut themselves down if they become too powerful. For example, if rewards are within a negative range, the agent performs well at subhuman levels but, when superhuman, prefers to stop functioning.

This approach avoids explicit tripwires, which an agent could potentially subvert. Instead, the incentive for self-termination arises naturally from the design.

In conclusion, the AIXI model provides a mathematical framework to analyze superintelligence. Many open questions remain, but the framework offers insights into both the potential and risks of AGI and ASI.

For those interested, my first book provides all the technical details. The second book offers an overview of progress over the last 20 years. The most recent book includes a safety chapter and the latest developments. Thank you!

3 Likes