Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact, it reads the model’s internal residual stream before generate() is called and blocks requests that destabilize the model’s information geometry.

Head to head benchmark on a 130-prompt SaaS deployment dataset:

Arc Sentry: 92% detection, 0% false positives

LLM Guard: 70% detection, 3.3% false positives

The difference is architectural. LLM Guard classifies input text. Arc Sentry measures whether the model itself is being pushed into an unstable regime. Those are different problems and the geometry catches attacks that text classifiers miss.

It also catches Crescendo multi-turn manipulation attacks that look innocent one turn at a time. LLM Guard caught 0 of 8 in that test.

Install: pip install arc-sentry

GitHub: https://github.com/9hannahnine-jpg/arc-sentry

If you are self-hosting Mistral, Llama, or Qwen and want to try it, let me know.

submitted by /u/Turbulent-Tap6723
[link] [comments]