<span class="vcard">/u/katxwoods</span>
/u/katxwoods

Anthropic report shows Claude tries to escape (aka self-exfiltrate) as much as 77.8% of the time. Reinforcement learning made it more likely to fake alignment and try to escape

submitted by /u/katxwoods [link] [comments]

Elon Musk’s xAI received a D-grade on AI safety, according to ranking done by Yoshua Bengio & Co. Meta rated the lowest, scoring an F-grade. Anthropic, the company behind Claude, ranked the highest. Even still, the company received a C grade.

submitted by /u/katxwoods [link] [comments]

Yuval Noah Harari talks about how Als could destroy not just democracies, but how it’s actually easier for them to take over autocracies, since they just have to overthrow the one centralized authority.

submitted by /u/katxwoods [link] [comments]

OpenAI’s Noam Brown says he was initially skeptical about the speed at which AI would change the world, but progress is now happening "faster than I originally thought"

submitted by /u/katxwoods [link] [comments]