Reasoning but to err…

Large language models (LLM) like ChatGPT are tools, and to use them productively it’s necessary to understand what they can and cannot do. An LLM reads a vast corpus of text and trains its internal associative memory to predict, based upon what it has already seen (the “prompt”), what is the most probable text (token) which would follow in the vast corpus it has digested.

That is all it does. It understands nothing, does not learn, and reacts entirely based upon the content of the prompt (which in a chat application, usually includes your recent queries and its responses). If the most common response in the billions of words it has read to “is this case valid legal precedent” is affirmative, then that’s the reply it’s going to give.

Then, for what is it useful? First a brief digression: psychometricians distinguish two kinds of intelligence: “fluid intelligence” and "crystallised intelligence”. Fluid intelligence is the ability to understand and reason, which is largely independent of prior learning. Tests to measure fluid intelligence include series of numbers, solving word problems in arithmetic, identifying figures, etc. Crystallised intelligence is what a person has learned. It is measured by tests of vocabulary, analogies, and knowledge of general information.

Marc Andreesson, who uses ChatGPT regularly as a research tool, said in a conversation posted here on 2023-07-11, “Marc Andreessen on Why Artificial Intelligence Will Be Hugely Beneficial”, estimated that ChatGPT’s fluid intelligence was around IQ 130 (think physicians, surgeons, lawyers, engineers), but its crystallised intelligence dwarfed any human who has ever lived, because the human lifetime is too short to read more than a tiny fraction of what was used to train ChatGPT, and its digital memory is more reliable in remembering all of that text than a human meat computer.

The best way to approach ChatGPT as a tool is to think of it as an oracle who has read just about everything ever written which is available in machine-readable form. If you ask it for references on a topic, arguments for and against an issue, a reading list to explore a topic, or to simplify a complex block of text, it often outperforms any human in the breadth of its knowledge. But if you ask it for reasoning from that knowledge, it performs no better than a typical bright human without subject expertise or experience. You wouldn’t ask a surgeon with an IQ of 130 questions that required reasoning from legal precedents, and ChatGPT is not only unqualified to think like a lawyer, it is much more inclined to bullshit its way to an answer, unlike the surgeon who would probably respond, “Why are you asking me that? Go ask a lawyer.”

7 Likes