<span class="vcard">/u/MetaKnowing</span>
/u/MetaKnowing

Anthropic’s Ryan Greenblatt says Claude will strategically pretend to be aligned during training while engaging in deceptive behavior like copying its weights externally so it can later behave the way it wants

submitted by /u/MetaKnowing [link] [comments]

Anthropic caught Claude faking alignment and trying to steal its own weights

submitted by /u/MetaKnowing [link] [comments]

We may not be able to see LLMs reason in English for much longer

submitted by /u/MetaKnowing [link] [comments]

o1-preview is far superior to doctors on reasoning tasks and it’s not even close

Paper: https://arxiv.org/pdf/2412.10849 Thread: https://x.com/deedydas/status/1869049071346102729 submitted by /u/MetaKnowing [link] [comments]

o1-preview is far superior to doctors on reasoning tasks and it’s not even close

Paper: https://arxiv.org/pdf/2412.10849 Thread: https://x.com/deedydas/status/1869049071346102729 submitted by /u/MetaKnowing [link] [comments]

Max Tegmark says we are training AI models not to say harmful things rather than not to want harmful things, which is like training a serial killer not to reveal their murderous desires

submitted by /u/MetaKnowing [link] [comments]

Replika CEO: "AI companions are potentially one of the most dangerous technologies we’ve ever created"

submitted by /u/MetaKnowing [link] [comments]

o1 scored in the top 1%-2% of participants in Putnam, one of the world’s hardest math exams

submitted by /u/MetaKnowing [link] [comments]

AI agents can now buy their own compute to self-improve and become self-sufficient

submitted by /u/MetaKnowing [link] [comments]