You've tested it. Maybe during a quiet Friday afternoon, maybe late at night when you were drafting a chronology and thought, "let's see what ChatGPT does with this." You pasted in some clinical records. It produced something that looked professional — well-structured, confident, even fluent in medical terminology.
Then you checked the case citations. One didn't exist. Another was from a different jurisdiction. A third cited the right case but got the principle wrong.
The output looked right. It sounded authoritative. But it was dangerously, quietly, professionally wrong.
And that's the problem with generic AI in legal practice. It doesn't fail loudly. It fails in ways that can survive a first read, a second read, and sometimes even make it into a filed document before someone catches it. Or doesn't.
Three Reasons Generic AI Fails in Law
This isn't about AI being "not ready" or "not smart enough." The underlying technology is genuinely powerful. The problem is specific to how general-purpose tools interact with legal work.
1. They hallucinate with confidence
Large language models are, at their core, pattern-completion machines. When you ask ChatGPT or Copilot for a case citation, it doesn't search a database. It predicts what a plausible citation would look like based on patterns in its training data.
Sometimes it predicts correctly. Often it doesn't. And when it gets it wrong, it doesn't flag uncertainty. It presents a fabricated case name, a plausible-sounding citation, and a confident summary of a ratio that never existed — all with the same tone it uses when it's right.
A tool that's right 80% of the time and wrong 20% of the time — with no way to tell which is which — is more dangerous than a tool that's wrong 100% of the time. At least you'd never trust the latter.
In litigation practice, this isn't academic. A hallucinated case reference in a case analysis could undermine your entire argument. A fabricated clinical guideline could form the basis of an expert report that collapses under cross-examination.
2. They don't understand workflow context
A litigation chronology isn't just a timeline. It's a legal document with specific structural requirements — source attribution, identification of treating practitioners, correlation between clinical events and alleged breaches, and a format that practitioners and barristers expect to see.
Generic AI doesn't know any of this. Ask it to "create a chronology from these medical records" and it will produce a timeline. A clean, well-formatted timeline that misses the structural elements your barrister needs, conflates entries from different treating practitioners, and presents clinical findings without linking them to source documents.
The same problem applies across every legal workflow. Intake screening requires understanding of limitation periods, jurisdictional requirements, and viability indicators specific to your practice area. Breach analysis requires knowledge of standard of care frameworks and clinical terminology. Generic AI treats all of these as simple text-processing tasks, because that's all it knows how to do.
3. They can't be audited
This is the problem that kills the other two. Even if you could tolerate occasional hallucinations and contextual gaps, you can't work with outputs you can't verify.
When a generic AI tool produces a chronology entry that says "Patient presented with worsening symptoms on 14 March 2024," where did that come from? Which document? Which page? Was it a clinical note, a discharge summary, or a specialist report? You don't know, because the tool doesn't tell you. It can't. It generated the text; it didn't extract it from a source.
In practice, this means every piece of output from a generic AI tool needs to be manually verified against the source documents. Which means your team is doing the work twice — once by the AI, once by a human checking the AI's work. The efficiency gain evaporates.
Why This Matters More in Complex Litigation
Every legal practice area has its complexities. But complex litigation compounds the generic AI problem in ways that make it particularly unsuitable for off-the-shelf tools.
Clinical terminology is dense and ambiguous. The same abbreviation can mean different things in different clinical contexts. "SOB" in a cardiology note means shortness of breath, not what a general language model might assume. "NAD" means no abnormality detected, not a casual dismissal. Generic AI frequently misinterprets clinical abbreviations because it lacks the domain-specific training to distinguish context.
Breach analysis requires understanding of standard of care. Identifying a potential breach isn't just about finding what went wrong — it's about understanding what should have happened according to accepted clinical practice and pinpointing where the deviation occurred. This requires specialist knowledge that general-purpose AI simply doesn't have.
Everything must be pinned to source material. In litigation, every assertion needs a citation. Not "the records suggest" but "page 47 of the Royal Melbourne Hospital discharge summary dated 14 March 2024 states..." Generic AI can't do this because it processes text as a stream, not as a structured set of source documents.
What Purpose-Built AI Actually Looks Like
The distinction isn't between "AI" and "no AI." It's between AI designed for general text generation and AI designed for specific legal workflows.
Purpose-built AI for legal practice works fundamentally differently from ChatGPT or Copilot:
- It extracts rather than generates. Instead of predicting what a chronology entry should look like, it identifies and pulls specific information from your source documents. No fabrication, because there's nothing to fabricate — it's working from the material you provide.
- Every output is pinned to source material. Page numbers, document references, dates — every finding links back to exactly where it came from. Your team can verify any entry in seconds, not minutes.
- It's structured for legal use. Chronologies come in chronology format. Breach analyses follow breach analysis structure. Intake screening captures the specific data points your practice needs, in the order your workflow expects.
- It integrates into your existing systems. Practice management, document management, email — purpose-built automation works within the tools your team already uses, not as a separate window they have to copy-paste from.
The GP and the Surgeon
Here's the analogy your clients would understand: you wouldn't ask a GP to perform cardiac surgery. Not because GPs are incompetent — they're highly trained professionals who are excellent at what they do. But cardiac surgery requires specialist training, specialist tools, and specialist context that a general practitioner doesn't have.
ChatGPT is the GP of AI. It's genuinely capable, broadly knowledgeable, and useful for general tasks. But asking it to produce a litigation chronology from case documents is like asking a GP to operate. The confidence is there. The competence isn't.
You specialise because general expertise isn't enough for complex work. Your AI should do the same.
The firms that get this right aren't avoiding AI. They're using AI that's built for the specific demands of their practice — tools that understand their document types, their workflow requirements, and the non-negotiable need for source-verified, auditable outputs.
The firms that get it wrong are the ones pasting clinical records into ChatGPT and hoping for the best. Some of them haven't been caught yet. Some of them will be.