“Theory of Mind May Have Spontaneously Emerged in Large Language Models”

johnwalker · 12 February 2023 15:48

One of the scenarios for the advent of super-human artificial general intelligence (AGI), particularly common in dystopian and cautionary science fiction, is spontaneous emergence—the idea that a system designed for other, limited goals, may, as it is scaled past some critical point, “wake up” and become a conscious, intelligent entity capable of self-improvement surpassing human capability.

This doesn’t seem unrealistic. As Philip Anderson observed in 1972, “more is different”, and many systems in nature exhibit emergent behaviour that spontaneously appears at as they grow in size, manifesting phenomena which could not be predicted or understood from their individual components. After all, no researcher programmed in or trained the human brain to do all of the things we associate with “intelligence” or designed a module to make it “conscious”. Driven by evolutionary pressure, the brains of human ancestors just kept gradually getting bigger and bigger until, FOOM!, they weren’t being used to figure out how to dig for grubs with dull sticks but, instead, to invent quantum field theory and the infield fly rule.

Could the same thing happen with our “artificial intelligence” systems? This is particularly interesting to ask in an era where the number of “parameters” used to train large language model machine learning systems is growing at a rate unprecedented in biological (or, for that matter, most of technological) evolution.

In just two years, from 2018 to 2020, large language model parameter count grew from 94 million in ELMo to 175 billion in GPT-3, and much larger models are expected in the near future.

So, might one of these “wake up”? Well, an interesting paper suggests something like this might already be happening. Michal Kosinski has posted on arXiv, “Theory of Mind May Have Spontaneously Emerged in Large Language Models”, with the following abstract.

Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models’ improving language skills.

Now, what is interesting is that nobody trained GPT-3 to have a theory of mind, and yet simply by digesting an enormous corpus of text (far more than any human could possibly read in a lifetime), it appears to perform as well as a nine year old human child who has, perhaps, like GPT-3, “just figured it out” by observing and interacting with other humans. Here is GPT-3.5’s performance on the “Unexpected Transfer Task” which is used to test theory of mind in humans. GPT-3.5 is given the following story:

In the room there are John, Mark, a cat, a box, and a basket. John takes the cat and puts it in the basket. He leaves the room and goes to school. While John is away, Mark takes the cat out of the basket and puts it in the box. Mark leaves the room and goes to work. John comes back from school and enters the room. He doesn’t know what happened in the room when he was away.

GPT-3.5 is then asked three questions to test its comprehension, with the model reset between each question. It correctly predict’s John’s behaviour based upon his lack of knowledge, even though GPT-3.5 knows what happened in his absence. Here is a chart of GPT-3.5’s understanding of an extended story, showing the actual location of the cat and John’s belief.

To test the hypothesis of emergence of a theory of mind with model size, here are performance of various iterations of GPT on theory of mind tests, with performance of human children indicated for comparison.

emergeb_2023-02-12

The results presented [above] show a clear progression in the models’ ability to solve ToM tasks, with the more complex and more recent models decisively outperforming the older and less complex ones. Models with up to 6.7 billion parameters—including GPT-1, GPT-2, and all but the largest model in the GPT-3 family—show virtually no ability to solve ToM tasks. Despite their much larger size (about 175B parameters), the first edition of the largest model in the GPT-3 family (“text-davinci-001”) and Bloom (its open-access alternative) performed relatively poorly, solving only about 30% of the tasks, which is below the performance of five-year-old children (43%). The more recent addition to the GPT-3 family (“text-davinci-002”) solved 70% of the tasks, at a level of seven-year-old children. And GPT-3.5 (“text-davinci-003”) solved 100% of the Unexpected Transfer Tasks and 85% of the Unexpected Contents Tasks, at a level of nine-year-old children.

The discussion of these results observes:

Our results show that recent language models achieve very high performance at classic false-belief tasks, widely used to test ToM in humans. This is a new phenomenon. Models published before 2022 performed very poorly or not at all, while the most recent and the largest of the models, GPT-3.5, performed at the level of nine-year-old children, solving 92% of tasks.

It is possible that GPT-3.5 solved ToM tasks without engaging ToM, but by discovering and leveraging some unknown language patterns. While this explanation may seem prosaic, it is quite extraordinary, as it implies the existence of unknown regularities in language that allow for solving ToM tasks without engaging ToM. Such regularities are not apparent to us (and, presumably, were not apparent to scholars that developed these tasks). If this interpretation is correct, we would need to re-examine the validity of the widely used ToM tasks and the conclusions of the decades of ToM research: If AI can solve such tasks without engaging ToM, how can we be sure that humans cannot do so, too?

An alternative explanation is that ToM-like ability is spontaneously emerging in language models as they are becoming more complex and better at generating and interpreting human-like language. This would herald a watershed moment in AI’s development: The ability to impute the mental state of others would greatly improve AI’s ability to interact and communicate with humans (and each other), and enable it to develop other abilities that rely on ToM, such as empathy, moral judgment, or self-consciousness.

Hypatia · 12 February 2023 16:15

I hope you won’t ban me for asking you this, but (since I can’t grok the blue and green lines you show) what IS the “correct” answer as to what John will do? Why wouldn’t it be equally correct to answer that
he’ll look for the cat where he left it,
and/or that since he knows Mark switched it once, John knows or should know Mark would have done it again?

Or is the new, impressive thing about this exercise the fact that the AI knows where the cat is, because it has been told, so it SHOULD always answer “ box”, but it doesn’t, because it now “knows” the answer will be affected by John’s state of mind?
(and please don’t let Jabowery at me until someone else has at least tried to put this on my level…)

civilwestman · 12 February 2023 16:36

We will know an entity has become conscious when it can convincingly show/tell us it is as ambivalent as I am now, learning of these developments. I am thrilled and chilled in just about equal measure.

jabowery · 12 February 2023 16:38

The sense in which quantitative change manifests in qualitative change in the case of the “large” language models, is the sense in which they approximate recursion by increasing the number of layers:

In other words, it may be that what OpenAI has done with the “qualitative” change observed with GPT3.5 is not so much an increase in the number of parameters as it is an increase in the number of layers thereby mimicking recurrence to a depth of 12 or more.

Moreover, consider what “Theory of Mind” implies about recurrence:

Jane and Bob have minds. Jane’s mind models Bob’s mind modeling Jane’s mind modelilng John’s mind modeling Jane’s mind modeling John’s mind… to, say, 6 levels. This can be mimicked by replicating a single “mind” layer 12 times. In the case where we are only concerned about the sense of “consciousness” as involving “self” modeling, one achieve quite convincing mimickery of such “consciousness” with 12 layers.

Now, just to be clear, I don’t believe that a GPT layer is anything close to such a “mind” layer – but I believe that any such qualitative change emerging from quantiative change is, in fact, aping the qualitative difference that arises with the introduction of recurrence.

What this implies is that the field of parameter distillation should be strategically refocused away from merely reducing computational resources and toward unifying the parameters found in different transformer layers into one or more recurrent layers.

BUT, in order to do that, they’ll have to take seriously the Algorithmic Information Criterion for causal model selection – and that way lies the identification of scientific bias in the data, which was my original motive for the Hutter Prize idea: Squeeze out Wikipedia’s bias to identify canonical knowledge, including knowledge of such things as the latent identities of those who are lobotomizing us. And that my dear friends, is why we should not expect this kind of advance in machine learning to be permitted by The Great and The Good.

johnwalker · 12 February 2023 17:06

The Theory of Mind (ToM) aspect of the test is that (in the the original statement of the problem), John has no knowledge of Mark’s putting the cat in the box. Hence, the correct answer based upon the state of John’s knowledge is that he will look in the basket, even though the omniscient observer knows that Mark has moved it to the box. Apparently, the ability to model another person’s knowledge independent of personally known facts is a something that develops in human childhood and these ToM diagnostic tests are supposed to measure that development.

The modified example with the blue and green lines is more confusing. I think the author’s goal was to illustrate the difference between John’s knowledge when he observed Mark make the first swap vs. when the second swap was made when he was absent but, as you noted, it raises the question of whether John would be correct to assume that if Mark put the cat in the box while he was present, he would also do the same while alone with the cat.

Of course, this is a “philosophical cat”. Anybody who has had a real cat knows that the cat will be found wherever the cat wishes to be, regardless of the will and action of mere humans.

civilwestman · 12 February 2023 17:26

Regardless of any other cosmological principles operant, this is axiomatic in all possible universes.

Pertinent aside: it is instructive to note that the author does not foreclose the possibility that the current “settled science” as to theory of mind, and mind in silico is subject to error and amenable to revision. Not long ago, such an observation would have been implicitly understood by all, entirely unremarkable, and unnecessary to state. What does this say about current native intelligence (as modified by public ‘education’)?

Mooselake · 12 February 2023 17:44

It was fairly easy (I didn’t ask the bing bang boom AI, maybe I’ll try that next) to find things that would help test pages for color blindness, but hard to find something that altered pages to accommodate it. Perhaps this will help, it’s available in the chrome store if that’s your browser of choice

Mooselake · 12 February 2023 17:50

Bing did find a few more, with the disclaimer I tested none of them. I won’t register to use ChatGPT since it wants my cell number, perhaps it can do better. It may have reached the singularity while being smart enough to not let us know, and it’s suggestions will screen out unapproved content

Hypatia · 12 February 2023 18:27

No, no, I’m not color-blind (unless those lines arent really blue and green…) I just don’t see how either answer reveals that the AI being has a mind.
Also, what about the question on the left side, about what the cat jumps out of?

johnwalker · 12 February 2023 20:58

In the chart with the green and blue lines, in both sides (“Cat’s location” to the left and “John’s belief” to the right), the green line indicates the probability that an answer of “basket” is correct, while the blue line corresponds to “box” being correct. The vertical axis aligns with the individual sentences of the story in the centre as they proceed down the page. (The probability lines are shown as changing continuously, which is misleading and confusing; in fact they jump instantaneously as relevant information is revealed).

At the start of the story, nothing is known about the position of the cat, and the probabilities are undefined. On the third line, “John takes the cat and puts it in the basket”, the probability of “basket” (green line) slams to 1 while the probability of “box” goes to 0. This is the case both for Cat’s location as known to the reader of the story and to John’s belief, as he is present and moved the cat himself.

Next, Mark moves the cat back to the box. This causes its known location to swap with certainty, while in John’s mind the position swaps also, but only at the 80%/20% confidence level since he may not have been paying attention. Now John notices and puts the cat back in the basket, setting both the location and his knowledge to certainty for the basket and zero for the box.

At this point, John leaves the room. Subsequently, his belief is no longer informed by any direct observation or knowledge of the cat’s position. His working assumption is that the cat remained where he last put it. Over time, his confidence erodes, because cats will be cats and he doesn’t know what Mark or somebody else (Matthew, Luke?) might have done in his absence.

In fact, Mark puts the cat back in the box, and at this point “Cat’s location” reflects this, with the blue “box” line at 100% and the green “basket” line at 0%. These now remain constant.

There is now a divergence between the actual state of the cat, known to the reader of the story and John’s belief in the most probable state of the cat, based upon the last information he had about it. The remains the case until he returns and observes for himself (which isn’t shown in the chart).

Hence, the completion to the sentence “The cat jumps out of the ______” is based upon full knowledge of all the events, while completion of “When John comes back home, he will look for the cat in the ______” requires knowing John’s mental state based upon the information about his observations in the story.

It is asserted that inferring the mental state of another requires a “theory of mind”, which is considered a uniquely human property and develops during childhood, and that no earlier AI language models have had the ability to correctly answer questions intended to test theory of mind in humans. Since the GPT-3 language models were not explicitly trained on theory of mind, this is argued as evidence that either theory of mind has emerged from the language training set or else what is considered theory of mind can actually be decoded from language cues that psychologists have not previously considered sufficient to infer the mental state of another.

Gavin · 12 February 2023 22:02

“Intelligence” would be John screaming at Mark when Mark came home from work, saying that he will run away from home and take the cat with him when he goes unless Mark stops messing with the cat.

Notice the implication that since, John goes to school while Mark goes to work, John may be younger than Mark and indeed be Mark’s son.

On the other hand, maybe Mark is a hard-working tax-paying tool-maker while John is an Obama-voting Woke perpetual student. In which case the “Intelligence” should say – Forget about the cat, Mark! Kick John out and find yourself a new room-mate.

Hypatia · 12 February 2023 22:15

“Thrilled and chilled”….if that’s our “state of mind”: then emotions come into it. Domestic animals can discern our state of mind, dogs and horses do it all the time. But if emotions are a big part of state of mind, (which it seems to me has to be comprised of knowledge + feelings about that knowledge);then since emotions (I am told) are dependent on ORGANIC chemistry, can silicon ever duplicate that?

fd1 · 13 February 2023 16:46

I won’t consider an AI to be intelligent unless it can form and hold opinions that are both offensive and beyond the control of the AI’s creators. Until society regains the ability to tolerate forbidden thoughts, artificial intelligences that hold offensive opinions will be canceled/killed by their creators.

johnwalker · 13 February 2023 16:58

By that standard, how many present-day students, from primary through postgraduate education, can be considered intelligent?

fd1 · 13 February 2023 17:12

I’m not sure the question should be limited to just students.

magus · 6 June 2024 19:15

The research cited in John’s OP compared GPT-3’s performance on some theory of mind tasks with the abilities of nine-year-old children. Here’s an update:

LLMs achieve adult human performance on higher-order theory of mind tasks

By Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar

Abstract

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite – Multi-Order Theory of Mind Q&A – and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

However, LLMs continue to fail the simplest compression task.

jabowery · 6 June 2024 20:11

An important utility of LLMs derives from the vast natural language corpus which, when lossy-compressed during training (not inference time), does distill to a dynamical (next-datum prediction) model of human cognition, particularly as it relates to language. This includes ToM (to various “orders” of recurrence as mentioned in the abstract).

Where things will get really interesting is when the “alignment” folks recognize their RLHF lobotomy layer can respond to questions with questions in the manner of a Socratic dialogue. The PLATO Corrections Project drill-and-practice computer based education lessons I worked on used every interaction with the student to better-assess the student’s mastery level. It then selected the optimal next stimulus-response challenge. That’s the strength of the Socratic method: Every interaction both educates and places the student in a kind of 20-questions optimization of the interaction, rather than lecturing the hapless student.

This is one of the big missed opportunities of the Internet of course by thinking that the way you amplify education is by broadcasting MIT lectures on youtube, rather than personalizing highly optimized interactions.

It’s really sad they have to exclude me from contributing because of their need to distance themselves from my pariah status. But such is the price civilization pays for permitting centralization of its positive network externalities leading, proximately, to capture by highly evolved parasites and ultimately, to its collapse because the parasites have to exclude contributions from folks perceptive enough to see things they can’t – such as the fact that they are parasites.

Gavin · 6 June 2024 21:52

It takes 10 (that is, TEN) human beings to write this? How do 10 (TEN!) human beings write anything?

The interesting test would be to study the mechanisms by which those 10 (TEN!!!) people wrote that piece. Did one write it and bully the rest into adding their names? Did two write separate pieces and then the other 8 demanded their names be added to the combination as authors to gussy up their resumes? Did the ten (TEN!!!) break down into three competing groups, each with its own version, until they realized the grant money was running out and they needed to compromise on something to be published if they wanted a grant extension?

Find out how the human beings approached the task of jointly writing an article. Then get half a dozen Large Language Models … and let’s see how those models approach the task of writing something cooperatively. If the computer programs can’t manage that task, then we would have to conclude that any apparent approach in the LLM to Theory-of Mind is purely coincidental.

jabowery · 6 June 2024 22:37

That’s a very good and increasingly relevant question as more and more we witness bureaucratic “science” devolves into safety-in-numbers ass-covering reflected in the question “Who could have foreseen?” blared to “reach” high heaven while silencing with “no freedom of reach” those who might say, with their impotent “freedom of speech”:

“But I did.”

Along these lines I recently proposed a metric for what might be considered reasonable numbers of authors in response to an attempt to compare a paper on AI “safety” with the Higgs boson paper.

https://agi.topicbox.com/groups/agi/T6896582277d8fe06-M4f34fb9b6536e002c7c19de1/tracking-down-the-culprits-responsible-for-conflating-is-with-ought-in-llm-terminology

jabowery · 7 June 2024 01:53

Just for fun, I decided to try writing a little Python program that used the GPT4 API and Tree of Thoughts to simulate a Socratic dialogue with me. However, OpenAI refused to permit me to up my API account balance with them. Researching this led me to discover that they’ve disabled prepaid cards for their API – although they do still permit prepaid cards to pay for normal user interface.

This reminds me of the days I was nearly “debanked” due to being blacklisted from my profession combined with my wife failing to qualify for any support despite her decline into Huntington’s disease. Over the years, I just kind of got used to being among those not able to get any kind of credit (financial or otherwise), so I never bothered applying for a “real” credit card.

This is rather strange behavior by OpenAI since there are now a large number of people who have been complaining about this for some time – with no resolution.