Multimodal AI and Theory of Mind

Multimodal ChatGPT clearly shows Theory of Mind capabilities. I’ve seen a range of papers that tested previous versions of ChatGPT/LLMs with text based Theory of Mind tests. But I couldn’t find any research that tested the new multimodal AI on this.

Here’s the video of my test.

I created a new set of screens and changed the character names from the commonly used “Sally” and “Anne” so it reduced the chances that ChatGPT would work out the goal of the exercise based on the training data it has consumed. In my test the characters are called “Heidi” (pun intended) and “Helen”. At each key point I also validated that ChatGPT understood and could describe what had happened in the scene.

The final result is that it very clearly demonstrates real Theory of Mind capabilities, well beyond that of a 4+ year old (e.g. Cognitive Revolution stage). For all the people that argue these LLMs are just “simulating” intelligence and that “multiplying weights” is not the same as “thinking”, I suggest that this provides “some” evidence to the contrary. Does this support an argument “for” Functionalism?

And if you read the comments these models show when “reasoning” you can see that it is clearly “self” talk. Things like “wait, let me think” and “I think I made a mistake, let me review that” shows that their cognition has a direction. Self talk can only really exist if it’s directed inwards towards a “self”.

You may argue that this is just an aesthetic wrapper, some style transfer on the predictive text generation that makes it “feel” like “self” talk. But look at what it was doing before and after each of these comments. They really have a very resulting impact on it’s output/thinking that these comments would on our own thinking. Previously I would have bought the style transfer argument, but now I’m not so sure.

I think at least this is a fascinating point in the discussion…

#CognitiveAnthropologyOfAI #ChatGPT #TheoryOfMind #AI #Functionalism #Thinking #SelfTalk

Enjoy Reading This Article?