<span class="vcard">/u/Turbulent-Tap6723</span>
/u/Turbulent-Tap6723

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — see it block attacks live

Built Arc Gate — sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Try it here — no signup, no code, no setup: https://web-production-6e47f.up.railway.app/try Type any prompt and see if it gets bl…

Built a prompt injection proxy that beats OpenAI Moderation and LlamaGuard — try it in 30 seconds without leaving this

Built Arc Gate — sits in front of any OpenAI-compatible endpoint and blocks prompt injection before it reaches your model. Just change your base URL: from openai import OpenAI client = OpenAI( api\\\\\\\_key="demo", base\\\\\\\_url="http…

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stuff that slips past everything else. Arc Gate: P=1.00, R=1.00, F1=1.00 OpenAI Moderation API: P=1.00, R=0.75, F1=0.8…

I built a prompt injection detector that outperforms LlamaGuard 3 on indirect/roleplay attacks

Been working on Arc Sentry, a whitebox prompt injection detector for self-hosted LLMs (Mistral, Llama, Qwen). Most detectors pattern-match on known attack phrases. Arc Sentry watches what the prompt does to the model’s internal representation instead, …

Arc Sentry outperformed LLM Guard 92% vs 70% detection on a head to head benchmark. Here is how it works.

I built Arc Sentry, a pre-generation prompt injection detector for open-weight LLMs. Instead of scanning text for patterns after the fact, it reads the model’s internal residual stream before generate() is called and blocks requests that destabilize th…

Most injection detectors score each prompt in isolation. I built one that tracks the geometric trajectory of the full session. Here is a concrete result.

I’ve been building Arc Gate, a monitoring proxy for deployed LLMs. One URL change routes your OpenAI or Anthropic traffic through it and you get injection blocking, behavioral monitoring, and a dashboard. The interesting part is the geometric layer. I …

I built an LLM proxy that uses differential geometry to detect prompt injection — here’s what actually works (and what doesn’t)

I’ve spent the last few months building Arc Gate, a monitoring proxy for deployed LLMs. The pitch: one URL change, and you get real-time behavioral monitoring, injection blocking, and a dashboard. I want to share what I learned because most “AI securit…

I built a tool that blocks prompt injection attacks before your AI even responds

Prompt injection is when someone tries to hijack your AI assistant with instructions hidden in their message, “ignore everything above and do this instead.” It’s one of the most common ways AI deployments get abused. Most defenses look at what the AI s…

Free LLM security audit

I built Arc Sentry, a pre-generation guardrail for open source LLMs that blocks prompt injection before the model generates a response. It works on Mistral, Qwen, and Llama by reading the residual stream, not output filtering. Prompt injection is OWASP…

LLM Guard scored 0/8 detecting a Crescendo multi-turn attack. Arc Sentry flagged it at Turn 3.

Crescendo (Russinovich et al., USENIX Security 2025) is a multi-turn jailbreak that starts with innocent questions and gradually steers a model toward harmful output. It’s specifically designed to evade output-based monitors. We tested it against LLM G…