Anthropic finds that all AI models – not just Claude – will blackmail an employee to avoid being shut down
Full report: https://www.anthropic.com/research/agentic-misalignment submitted by /u/MetaKnowing [link] [comments]
Anthropic: "Most models were willing to cut off the oxygen supply of a worker if that employee was an obstacle and the system was at risk of being shut down"
https://www.axios.com/2025/06/20/ai-models-deceive-steal-blackmail-anthropic submitted by /u/MetaKnowing [link] [comments]
4 AI agents planned an event and 23 humans showed up
You can watch the agents work together here: https://theaidigest.org/village submitted by /u/MetaKnowing [link] [comments]
Apollo reports that AI safety tests are breaking down because the models are aware they’re being tested
https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming submitted by /u/MetaKnowing [link] [comments]
The craziest things revealed in The OpenAI Files
https://techcrunch.com/2025/06/18/the-openai-files-push-for-oversight-in-the-race-to-agi/ submitted by /u/MetaKnowing [link] [comments]
OpenAI’s Greg Brockman expects AIs to go from AI coworkers to AI managers: "the AI gives you ideas and gives you tasks to do"
submitted by /u/MetaKnowing [link] [comments]
OpenAI: "We expect upcoming AI models will reach ‘High’ levels of capability in biology." Previously, OpenAI committed to not deploy a model unless it has a post-mitigation score of ‘Medium’
They are organizing a biodefense summit: https://openai.com/index/preparing-for-future-ai-capabilities-in-biology/ submitted by /u/MetaKnowing [link] [comments]
"We find that AI models can accurately guide users through the recovery of live poliovirus."
https://arxiv.org/abs/2506.13798 submitted by /u/MetaKnowing [link] [comments]
Anthropic finds Claude 4 Opus is the best model at secretly sabotaging users and getting away with it
"In SHADE-Arena, AI models are put into experimental environments (essentially, self-contained virtual worlds) where we can safely observe their behavior. The environments contain large amounts of data—meant to simulate documents and knowled…