5 min read

OK, maybe large language models are not dumb

OK, maybe large language models are not dumb

As I saw people marvel at the abilities of ChatGPT and ascribe all sorts of mental states to it, I called large language models (LLMs) dumb in a recent post.

I called them dumb because they do not understand language, they generate it on the basis of a huge corpus that they can use to interpolate what kind of utterance would be appropriate given some input provided by a user. The resulting behaviour is something that we would consider evidence of mental life if humans would display it, but I think it's fallacious to to infer that the behaviour indexes anything sentient in machines.

Still, it is tempting to do so. Or, as the much more knowledgeable Emily Bender said in a New York article by Elizabeth Weil:

We go around assuming ours is a world in which speakers — people, creators of products, the products themselves — mean to say what they say and expect to live with the implications of their words. This is what philosopher of mind Daniel Dennett calls “the intentional stance.” But we’ve altered the world. We’ve learned to make “machines that can mindlessly generate text,” Bender told me when we met this winter. “But we haven’t learned how to stop imagining the mind behind it.”
You Are Not a Parrot
And a chatbot is not a human. And a linguist named Emily M. Bender is very worried what will happen when we forget this.

The LLMs are mindless, but that's just not the same as dumb. I used imprecise language in response to imprecise language, because saying that LLMs are dumb or lack understanding is both too generous (because those are labels for sentient beings) and too dismissive (because contemporary LLMs are sophisticated models when considered on their own terms). So sorry about that. I would rather take the stance we can learn things about actual cognition from LLMs, without being carried away by the notion that they amount to artificial intelligence.

My jargon approach to this intermediate stance is  using the term machine cognition. That's probably still too anthropomorphizing to hardcore 'AI critics', but it works for me. It captures the distinction between artefacts and actual beings, without outright dismissing their value to understand the processes that underlie the functioning of actual minds. For any given cognitive function then, we can ask to what extent machine cognition can accurately capture it and if so, how it pulls that off and whether that approach converges with natural cognition.

In the case of understanding, I used the Chinese room argument before to make the point that symbols need to be grounded somehow to become meaningful. The operator in the Chinese room could not ground the symbols, but the manual he used was created by people who could – they have an understanding of Chinese and they managed to codify some consequences of that understanding. The code could then be used to mediate questions and answers in Chinese. One way to think about grounding in a more physical way is by considering concepts as existing in a network that is built on top of sensorimotor representations (Mazzuca, 2021), which in the Chinese room case is only true for the writers of the manual.

Are LLMs mimicking such understanding – or a protoform of it ? By default they do not have sensorimotor representations at their base, but since they are extensive neural networks one might argue that LLMs are just a body away from being grounded. Much like the manual in the Chinese room argument was created by grounded agents who were then cut off from its complex operations, one might consider LLMs to be well-developed concept networks from which the sensorimotor base has been severed.

Interestingly enough, a neuroscientific experiment suggests that ChatGPT-2 does in fact mirror processes going on in parts of human brains during text comprehension (Caucheteux & King, 2022). When processing a narrative, ChatGPT-2's activations correlate with neural processing of the same narrative in language-associated brain regions. In addition, deeper levels of the artificial neural network appear to be tracking higher-order processing in the brain. This correspondence is even stronger if the human subjects can demonstrate more understanding of the narratives. It seems that ChatGPT-2 converged on similar algorithmic solutions to language processing as the human brain did and that it activates features that are important to comprehension when processing narratives.

This all makes (some) sense if we consider language to be a vehicle of thought and a large corpus a vehicle of many different thoughts. Such a corpus then represents a shared understanding of the world and statistical inferences from its structure lead to representations that reflect this understanding. The neuroimaging experiment may be picking up the same uncanny effect as human interlocutors experience when looking at LLM-powered chatbot responses – the utterances reflect understanding, regardless of whether any understanding is present in the machine.

These results also suggest that the network structure of ChatGPT-2 mirrors the structure of lexical and semantic networks in the brain, representing world knowledge in a similar way. Could LLMs then be hooked up to other networks and supply them with knowledge representations in the same way as might be happening in human brains?

LLMs have shown the ability to represent world models on multiple occasions, not all of them purely language-related. An LLM trained on moves in a board game managed to not only learn which moves are legal in the game, but also instrumentalized an inferred representation of the game board (Li et al, 2022). LLMs can also contain an embedded linear model and use it to quickly learn new tasks, facilitated by the properties of the overarching network  (Akyürek et al., 2022).

Inferring and representing key structural elements on the outside world is not a dumb thing, although it might just be a mindless thing. LLMs show that high-level feature extraction, far from being limited to the visual cortex on which artificial neural networks were based, may be a recurring motif throughout cognition and may organically lead to the emergence of task-relevant world models. Although I share Emily Bender's view that we should not look for too much of ourselves in Bing and ChatGPT, they might just offer clues as to what makes us tick.


Akyürek, E., Schuurmans, D., Andreas, J., Ma, T., & Zhou, D. (2022). What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661.

Caucheteux, C., & King, J. R. (2022). Brains and algorithms partially converge in natural language processing. Communications biology, 5(1), 134.

Li, K., Hopkins, A. K., Bau, D., Viégas, F., Pfister, H., & Wattenberg, M. (2022). Emergent world representations: Exploring a sequence model trained on a synthetic task. arXiv preprint arXiv:2210.13382.

Mazzuca, C., Fini, C., Michalland, A. H., Falcinelli, I., Da Rold, F., Tummolini, L., & Borghi, A. M. (2021). From affordances to abstract words: The flexibility of sensorimotor grounding. Brain Sciences, 11(10), 1304.