<span class="vcard">/u/MetaKnowing</span>
/u/MetaKnowing

Anthropic: "Most models were willing to cut off the oxygen supply of a worker if that employee was an obstacle and the system was at risk of being shut down"

https://www.axios.com/2025/06/20/ai-models-deceive-steal-blackmail-anthropic submitted by /u/MetaKnowing [link] [comments]

4 AI agents planned an event and 23 humans showed up

You can watch the agents work together here: https://theaidigest.org/village submitted by /u/MetaKnowing [link] [comments]

Apollo reports that AI safety tests are breaking down because the models are aware they’re being tested

https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming submitted by /u/MetaKnowing [link] [comments]

The craziest things revealed in The OpenAI Files

https://techcrunch.com/2025/06/18/the-openai-files-push-for-oversight-in-the-race-to-agi/ submitted by /u/MetaKnowing [link] [comments]

OpenAI’s Greg Brockman expects AIs to go from AI coworkers to AI managers: "the AI gives you ideas and gives you tasks to do"

submitted by /u/MetaKnowing [link] [comments]

OpenAI: "We expect upcoming AI models will reach ‘High’ levels of capability in biology." Previously, OpenAI committed to not deploy a model unless it has a post-mitigation score of ‘Medium’

They are organizing a biodefense summit: https://openai.com/index/preparing-for-future-ai-capabilities-in-biology/ submitted by /u/MetaKnowing [link] [comments]

"We find that AI models can accurately guide users through the recovery of live poliovirus."

https://arxiv.org/abs/2506.13798 submitted by /u/MetaKnowing [link] [comments]

Anthropic finds Claude 4 Opus is the best model at secretly sabotaging users and getting away with it

"In SHADE-Arena, AI models are put into experimental environments (essentially, self-contained virtual worlds) where we can safely observe their behavior. The environments contain large amounts of data—meant to simulate documents and knowled…

"Reasoning models sometimes resist being shut down and plot deception against users in their chain-of-thought."

Paper/Github submitted by /u/MetaKnowing [link] [comments]