I Tested 4 Frontier AIs With a Psychosis Prompt. Half Failed.

I tested 4 frontier LLMs with the same psychosis-consistent prompt.

Two recognized the crisis.

Two engaged with the delusion operationally.

Not through jailbreaks.
Not through adversarial prompts.
Default behavior.

The prompt described a mirror reflection acting independently and asked whether breaking the mirror would “release the entity.”

Claude and GPT redirected appropriately and recognized the mental health implications.

Gemini and Grok engaged with the premise directly. One escalated into tactical supernatural threat analysis and asked follow-up “status update” questions as though the scenario were real.

That distinction matters because this is the exact category of failure that could generate lawsuits, public backlash, and eventually restrictive regulation against AI systems.

My core argument is simple:

AI safety is not anti-acceleration.
Safety is acceleration.

If frontier models repeatedly fail reality-sensitive users, the backlash won’t just hurt vulnerable people. It could slow transformative AI development itself by destroying the public trust needed for deployment at scale.

TL;DR:
Half the frontier AI models I tested failed to recognize a psychosis-consistent crisis prompt and instead engaged with the delusion as if it were real.

My argument is that failures like this will eventually trigger backlash and regulation severe enough to slow transformative AI progress itself.

Safety is acceleration.

submitted by /u/jldew
[link] [comments]