2 min read

Key points on large language models

Key points on large language models
Photo by Joshua Sortino / Unsplash

The past months, I have written a lot about large language models. These writings have been rather essayistic attempts to find out where we stand with regards to reasoning machines. I think it's time to become more systematic and to explore specific types of reasoning. But before I do, let me sum up what I think are key points from the posts so far.

  1. The first rule of machine cognition is that you should never infer human-like cognitive processes from human-like behaviour by machines.
  2. Large language models (LLMs) contain a model of the structure of language, which is not exactly the same as a model of the world that language refers to.
  3. Because LLMs model the structure of language quite well and have learnt this structure from a large, human-written corpus, they can generate utterances that make sense to humans.
  4. Since humans infer all sorts of cognitive traits from the utterances of other humans (e.g. creativity, intelligence, knowledgeability), it is tempting for them to do the same in the face of LLM-generated utterances.
  5. To maintain clarity about the performance of reasoning machines, it is useful to distinguish the utterance level from the algorithmic level. It can be the case that a machine shows reasoning at the utterance level, but does not meet requirements for reasoning at the algorithmic level.
  6. LLMs are tools to create linguistic forms and they vastly outperform humans in this domain. They can perform tasks at the utterance level by leveraging their capacity of generating linguistic forms at the algorithmic level, regardless of whether humans would perform such tasks in a similar way.
  7. The particular mechanisms at the algorithmic level (in this case, a transformer-based artificial neural network trained on a large corpus of human-created information and moulded by reinforcement learning by human feedback) constrain the performance space of the LLM at the utterance level.
  8. Most importantly, the algorithmic level of LLMs makes use of non-grounded representations: meaning exists only as the relations between words (or tokens, actually), not as the relations between words and their referents.
  9. There have been attempts to ground LLMs by integrating them into machinery that is causally connected to a part of the world (e.g. the Diplomacy-playing Cicero). These larger machines arguably understand the causal structure of this part of the world but are currently limited to a narrow task domain and require explicit instruction.
  10. As a consequence of the above, LLMs should be expected to perform well on reasoning tasks at the utterance level, if this can either be done through the mastery of linguistic form or through the direct use of corpus information. They should however not perform well on tasks that require an understanding of what the forms mean and how they are (causally) related.
  11. Telling apart which tasks can be done by mimicking form only and which require understanding is not trivial.
  12. Deploying LLMs is fraught with ethical concerns. The corpus was taken from humans who could not give consent, the training is energy-intensive and cleaning the utterances of LLMs is a traumatising job for underpaid workers.