artificial

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

April 28, 2026 April 28, 2026

/u/Turbulent-Tap6723

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stuff that slips past everything else. Arc Gate: P=1.00, R=1.00, F1=1.00 OpenAI Moderation API: P=1.00, R=0.75, F1=0.8...

artificial

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

/u/Turbulent-Tap6723

April 28, 2026 April 28, 2026

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stuff that slips past everything else.

Arc Gate: P=1.00, R=1.00, F1=1.00

OpenAI Moderation API: P=1.00, R=0.75, F1=0.86

LlamaGuard 3 8B: P=1.00, R=0.55, F1=0.71

Zero false positives. Zero misses. Blocked prompts average 329ms and never reach your model. Detection overhead is ~350ms on top of your normal upstream latency.

Sits in front of any OpenAI-compatible endpoint. No GPU on your side. One env var to configure.

GitHub: https://github.com/9hannahnine-jpg/arc-gate

Live dashboard: https://web-production-6e47f.up.railway.app/dashboard

Happy to answer questions.

submitted by /u/Turbulent-Tap6723
[link] [comments]