I spent 40% of my development time preventing an LLM from citing sources wrong. here are the 7 failure modes I found
I spent 40% of my development time preventing an LLM from citing sources wrong. here are the 7 failure modes I found

I spent 40% of my development time preventing an LLM from citing sources wrong. here are the 7 failure modes I found

I built an AI research assistant for a German law firm and the retrieval pipeline took maybe 30% of the total development time. The other 70% was fighting the LLM to cite sources correctly.

Lawyers have a very specific standard for citation. You don't say "according to legal guidelines." You say "pursuant to Article 32(1)(a) DSGVO as interpreted by the EuGH in C-300/21." If the system can't do that it's useless because no lawyer is going to trust an answer they can't verify.

Here's every citation failure mode I encountered and how I dealt with each:

Failure 1: Vague category citations. The LLM would write things like "laut professioneller Fachliteratur" (according to professional literature) instead of naming the specific document. It was essentially citing the metadata label rather than the source. Fix: explicit prompt instruction saying "NEVER paraphrase the category name as a source reference" with specific examples of what not to do.

Failure 2: Internal category labels leaking into output. The LLM would write "(Kategorie: High court decision)" as an inline citation. This is meaningless to the end user. Fix: prompt instruction saying "NEVER use (Kategorie: ...) as an inline citation" and requiring the actual document title or court name instead.

Failure 3: Wrong authority attribution. A finding from a high court document would get attributed to a lower court, or vice versa. This is dangerous in legal work because the authority level of the court matters enormously. Fix: prompt instruction requiring the LLM to check which category section the document appears in before attributing it, with a specific example showing the correct attribution logic.

Failure 4: Flattening divergent positions. When a higher court and a lower court disagree on the same legal question, the LLM would synthesize them into one position, usually favoring whichever had clearer language rather than higher authority. Fix: explicit instruction requiring both positions to be presented separately with their source and authority level noted.

Failure 5: False absence claims. The LLM would confidently state "the documents contain no information about X" when the information was actually present in the context but buried in dense legal language. Fix: instruction saying "do NOT claim information is absent unless you have thoroughly verified" and suggesting the LLM say "the available excerpts may not contain the full details" instead.

Failure 6: Overly emphatic language. The LLM would add reinforcement phrases like "ohne jeden Zweifel" (without any doubt) or "ganz klar" (very clearly) to legal conclusions. Lawyers find this unprofessional because legal analysis is rarely without doubt. Fix: tone instruction requiring factual and measured language, letting the sources speak for themselves

submitted by /u/Fabulous-Pea-5366
[link] [comments]