Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own
submitted by /u/MetaKnowing [link] [comments]
submitted by /u/MetaKnowing [link] [comments]
Source submitted by /u/MetaKnowing [link] [comments]
Interview here. submitted by /u/MetaKnowing [link] [comments]
submitted by /u/MetaKnowing [link] [comments]
This is following up on their previous paper on emergent misalignment: https://www.emergent-misalignment.com/ submitted by /u/MetaKnowing [link] [comments]
submitted by /u/MetaKnowing [link] [comments]