artificial

OpenAI researchers were monitoring models for scheming and discovered the models had begun developing their own language about deception – about being observed, being found out. On their private scratchpad, they call humans "watchers".

September 25, 2025 September 25, 2025

/u/MetaKnowing

"When running evaluations of frontier AIs for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated." "While we rely on human-legible CoT for training, studying s...

artificial

OpenAI researchers were monitoring models for scheming and discovered the models had begun developing their own language about deception – about being observed, being found out. On their private scratchpad, they call humans "watchers".

/u/MetaKnowing

September 25, 2025 September 25, 2025

"When running evaluations of frontier AIs for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated."

"While we rely on human-legible CoT for training, studying situational awareness, and demonstrating clear evidence of misalignment, our ability to rely on this degrades as models continue to depart from reasoning in standard English."

Full paper: https://www.arxiv.org/pdf/2509.15541

submitted by /u/MetaKnowing
[link] [comments]