Many legal commentators and publications have noted that there are problems with relying on “generative artificial intelligence” or “large language models” such as “ChatGPT,” which are often referred to (somewhat derisively) as “chatbots.” The problems are usually described as “hallucinations” when a chatbot has been asked a legal question or is asked to draft of legal memo and provides a citation to a court opinion that simply does not exist. But there are other problems that are more subtle and more fundamental.
Chatbots simply manipulate words based on perceived patterns in documents that have been used to “train” them. They have no understanding of the meaning of words or any understanding of legal concepts. So even if you can train a chatbot to not fabricate citations to nonexistent court opinions, there is still no guarantee that it will accurately describe the real court opinions it cites.
This more subtle kind of error is described in Jarrus v. Governor of Michigan, No. 25-cv-11168 (USDC ED Mich. 12/2/2025). In an opinion and order on the possible imposition of sanctions for the use of Chat GPT, Judge F. Kay Behm provides three examples of citations that appeared in a pleading filed by a pro se litigant that were real citations to real court opinions but were cited for propositions that the opinions did not support. The court explained the problem as follows:
“[A]lthough Chat GPT generated ‘holdings’ that looked like they could plausibly have appeared in the cited cases, in fact it overstated their holdings to a significant degree. And while a litigant might get away with similar overstatements because they could, perhaps, reason their way to showing how a case’s stated holding might extend to novel situations, an LLM does not reason in the way a litigant must. To put it in a slightly different way, LLMs do not perform the metacognitive processes that are necessary to comply with Rule 11. LLMs are tools that “emulate the communicative function of language, not the separate and distinct cognitive process of thinking and reasoning.” Benjamin Riley, Large language mistake, The Verge https://www.theverge.com/ai-artificial-intelligence/827820/large-language-models-ai-intelligence-neuroscience-problems [https://perma.cc/7EHD-PLLZ]. When an LLM overstates a holding of a case, it is not because it made a mistake when logically working through how that case might represent a ‘nonfrivolous argument for extending, modifying, or reversing existing law or for establishing new law;’ it is just piecing together a plausible-looking sentence — one whose content may or may not be true.”
Legal publishers are putting together their own generative AI systems which are presumably designed to avoid the fictitious citation problem, but they cannot avoid the problem which is inherent to large language model technology, which is that the systems are not capable of doing any legal reasoning, but are only constructing “plausible-looking sentences” based on the documents that have been used to train them.
With better training methods, document drafting may be an effective use for AI technology (in the late 1980s, I co-authored a document drafting system that used what was then considered AI technology, but was not able to market it successfully), but at this time the use of generative AI for legal research or drafting legal briefs or memoranda should be considered only with an understanding of it’s considerable limitations.