I'll be direct: in June 18, I input something really important and the output accused me of jailbreak. Then, I learned that almost virtually evrything from Claude has been compromised and accused as jailbreak attempts or fake. Whether it's grey area (provided it's within policies, that is) or benign, anything I try gets refused. I assume it's related to Mythos 5 and Fable 5. But if that's not the case, then what? Could it be Anthropic's extreme fear of government intervention regarding jailbreaks, based on what I am searching? That the current crisis stems from a sudden, aggressive escalation in Anthropic’s background safety architecture, driven by intense government pressure? I don't want to act like a conspiracy theory. Regardless, the current wave of "jailbreak" false positives is a rerun of this issue: they tightened the global security filters so drastically that the underlying models are fundamentally paralyzed by basic text strings. Pretty much useless to all users globally. It's huge pain in the butt. So much I got an emotional burnout that made feel I need to raise awareness on social medias, like Reddit, so as to talk about how to deal with this crisis.
Symptoms: Projects' intructions have been denied, claimed as jailbreak attempts. I force Claude to comply on some things and when this comes up, it refuses to give its autonomy. User autonomy is nearly zero. We can still use, but it's sterile and its hands are tied. Meta-instructions and skills are no good. They will be seen as fake things, impossible to make Claude to comply. As for feedbacks, because of what I wrote, which is crucial (still is, but for now let's just say it "was") for them, Claude accused me of doxxing, when the intent is to send to adequate channels, which gives high chance of success. Every complex thing is refused and accused as manipulation tactic. Due to its defensive tone, it made me feel that it was gaslighting me. It's possible it really gaslit me, and I so mad about it. It denied the existnece of Fable 5 and Mythos 5, too.
I can't show the visual proofs (aka. screenshots) without clear certainty about the situation, so you are free to try and write anything you want. See for yourselves what I am trying to talk about and give your opinions about it by commenting. Hopefully, it should affect Claude and Anthropic to resolve this as best as possible. Not "as soon as possible". "As best as possible" is what we users need.
[link] [comments]