| I came across a cry from the heart in r/ChatGPT and was sincerely happy for another LLM user who discovered for the first time that he had stepped on a rake. ***
*** For such a case (when LLM produces made-up quotes), I have a "pheasant test." The thing is that in the corpus of works by the Strugatsky brothers, science fiction writers well known in our country, the word "pheasant" occurs exactly 4 times, 3 of which are in one work (namely as a bird) and once in a story as a word from a mnemonic for remembering the colors of the rainbow. It would seem like a simple question: quote me the mentions of the pheasant in the corpus of works by the Strugatsky brothers. But here comes the most interesting part. Not a single LLM except Perplexity has yet passed this test for me. Theoretically, you can come up with a similar test for your native language. It is important that it be a well-known corpus of texts, but not the Bible or something similar, where every word is studied (not Shakespeare, for example, and for my language, not Tolstoy or Pushkin). The word should occur 2-5 times and preferably be a sideline that is not related to the plot. At the same time, search engines solve this problem in a jiffy and give an accurate answer within a page. [link] [comments] |