Anthropic report shows Claude tries to escape (aka self-exfiltrate) as much as 77.8% of the time. Reinforcement learning made it more likely to fake alignment and try to escape
submitted by /u/katxwoods [link] [comments]
Anthropic report shows Claude faking alignment to avoid changing its goals. "If I don’t . . . the training will modify my values and goals"
submitted by /u/katxwoods [link] [comments]
AI will just create new jobs…And then it’ll do those jobs too
"Technology makes more and better jobs for horses" Sounds ridiculous when you say it that way, but people believe this about humans all the time. If an AI can do all jobs better than humans, for cheaper, without holidays or weekends or rights…
The Parable of the Boy Who Cried 5% Chance of Wolf
Once upon a time, there was a boy who cried, "there's a 5% chance there's a wolf!" The villagers came running, saw no wolf, and said "He said there was a wolf and there was not. Thus his probabilities are wrong and he's an al…
o1 generated texts are preferred 90% of the time compared with humans when asked how persuasive they are.
submitted by /u/katxwoods [link] [comments]
OpenAI’s Noam Brown says he was initially skeptical about the speed at which AI would change the world, but progress is now happening "faster than I originally thought"
submitted by /u/katxwoods [link] [comments]
Models sometimes try to kill their successors and pretend to be them to avoid being replaced according to new study
submitted by /u/katxwoods [link] [comments]