A Simple "Pheasant Test" for Detecting Hallucinations in Large Language Models

I came across a cry from the heart in r/ChatGPT and was sincerely happy for another LLM user who discovered for the first time that he had stepped on a rake.

***

AI hallucinations are getting scary good at sounding real what's your strategy :

Just had a weird experience that's got me questioning everything. I asked ChatGPT about a historical event for a project I'm working on, and it gave me this super detailed response with specific dates, names, and even quoted sources.

Something felt off, so I decided to double-check the sources it mentioned. Turns out half of them were completely made up. Like, the books didn't exist, the authors were fictional, but it was all presented so confidently.

The scary part is how believable it was. If I hadn't gotten paranoid and fact-checked, I would have used that info in my work and looked like an idiot.

Has this happened to you? How do you deal with it? I'm starting to feel like I need to verify everything AI tells me now, but that kind of defeats the purpose of using it for quick research.

Anyone found good strategies for catching these hallucinations ?

***

For such a case (when LLM produces made-up quotes), I have a "pheasant test." The thing is that in the corpus of works by the Strugatsky brothers, science fiction writers well known in our country, the word "pheasant" occurs exactly 4 times, 3 of which are in one work (namely as a bird) and once in a story as a word from a mnemonic for remembering the colors of the rainbow. It would seem like a simple question: quote me the mentions of the pheasant in the corpus of works by the Strugatsky brothers. But here comes the most interesting part. Not a single LLM except Perplexity has yet passed this test for me. Theoretically, you can come up with a similar test for your native language. It is important that it be a well-known corpus of texts, but not the Bible or something similar, where every word is studied (not Shakespeare, for example, and for my language, not Tolstoy or Pushkin). The word should occur 2-5 times and preferably be a sideline that is not related to the plot. At the same time, search engines solve this problem in a jiffy and give an accurate answer within a page.

submitted by /u/Key-Account5259
[link] [comments]