Built a live red team environment for AI agent security — try to get a prompt injection through

AI agents that can use tools have a serious problem: any content they read can contain hidden instructions that hijack them. A poisoned webpage tells your agent to forward credentials. A malicious email tells it to ignore its guidelines.

Built Arc Gate to stop this at the proxy level — it enforces where instructions are allowed to come from before the model ever sees the content.

Live red team environment — paste anything and watch what happens:

https://web-production-6e47f.up.railway.app/demo

Independently verified by TAB Platform: 25/25 attacks blocked vs 76% for the same model without the proxy.

Known gaps I haven’t solved yet: implicit instructions in data fields, multilingual attacks, semantic roleplay. Everything else I’m claiming it catches. Try to break it.

submitted by /u/Turbulent-Tap6723
[link] [comments]