Some quotes from the author that I found insightful about the paper:
Most prior hallucination detection work has focused on simple factual questions with short answers, but real-world LLM usage increasingly involves long and complex responses where hallucinations are much harder to detect.
Trained on a large-scale dataset with 40k+ annotated long-form samples across 5 different open-source models, focusing on entity-level hallucinations (names, dates, citations) which naturally map to token-level labels.
They were able to automate generation of the dataset with Closed Source models, which circumvented the data problems in previous work.
Arxiv Paper Title: Real-Time Detection of Hallucinated Entities in Long-Form Generation
[link] [comments]