/u/Turbulent-Tap6723

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks live

/u/Turbulent-Tap6723 April 29, 2026 April 29, 2026

Built Arc Gate — sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Try it here — no signup, no code, no setup: https://web-production-6e47f.up.railway.app/try Type any prompt and see if it gets bl…

artificial

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — try it in 30 seconds without leaving this

/u/Turbulent-Tap6723 April 29, 2026 April 29, 2026

Built Arc Gate — sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Just change your base URL: from openai import OpenAI client = OpenAI( api\\\\\\\_key="demo", base\\\\\\\_url="http…

artificial

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

/u/Turbulent-Tap6723 April 28, 2026 April 28, 2026

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stuff that slips past everything else. Arc Gate: P=1.00, R=1.00, F1=1.00 OpenAI Moderation API: P=1.00, R=0.75, F1=0.8…

artificial

I built a prompt injection detector that outperforms LlamaGuard 3 on indirect/roleplay attacks

/u/Turbulent-Tap6723 April 27, 2026 April 27, 2026

Been working on Arc Sentry, a whitebox prompt injection detector for self-hosted LLMs (Mistral, Llama, Qwen). Most detectors pattern-match on known attack phrases. Arc Sentry watches what the prompt does to the model’s internal representation instead, …

artificial

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

/u/Turbulent-Tap6723 April 23, 2026 April 23, 2026

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact, it reads the model’s internal residual stream before generate() is called and blocks requests that destabilize th…

artificial

Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.

/u/Turbulent-Tap6723 April 20, 2026 April 20, 2026

I’ve been building Arc Gate, a monitoring proxy for deployed LLMs. One URL change routes your OpenAI or Anthropic traffic through it and you get injection blocking, behavioral monitoring, and a dashboard. The interesting part is the geometric layer. I …

artificial

I built an LLM proxy that uses differential geometry to detect prompt injection — here’s what actually works (and what doesn’t)

/u/Turbulent-Tap6723 April 19, 2026 April 19, 2026

I’ve spent the last few months building Arc Gate, a monitoring proxy for deployed LLMs. The pitch: one URL change, and you get real-time behavioral monitoring, injection blocking, and a dashboard. I want to share what I learned because most “AI securit…

artificial

I built a tool that blocks prompt injection attacks before your AI even responds

/u/Turbulent-Tap6723 April 16, 2026 April 16, 2026

Prompt injection is when someone tries to hijack your AI assistant with instructions hidden in their message, “ignore everything above and do this instead.” It’s one of the most common ways AI deployments get abused. Most defenses look at what the AI s…

artificial

Free LLM security audit

/u/Turbulent-Tap6723 April 14, 2026 April 14, 2026

I built Arc Sentry, a pre-generation guardrail for open source LLMs that blocks prompt injection before the model generates a response. It works on Mistral, Qwen, and Llama by reading the residual stream, not output filtering. Prompt injection is OWASP…

artificial

LLM Guard scored 0/8 detecting a Crescendo multi-turn attack. Arc Sentry flagged it at Turn 3.

/u/Turbulent-Tap6723 April 14, 2026 April 14, 2026

Crescendo (Russinovich et al., USENIX Security 2025) is a multi-turn jailbreak that starts with innocent questions and gradually steers a model toward harmful output. It’s specifically designed to evade output-based monitors. We tested it against LLM G…

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: