| | "When running evaluations of frontier AIs for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated." "While we rely on human-legible CoT for training, studying situational awareness, and demonstrating clear evidence of misalignment, our ability to rely on this degrades as models continue to depart from reasoning in standard English." Full paper: https://www.arxiv.org/pdf/2509.15541 [link] [comments] |