AI agents that can use tools have a serious problem: any content they read can contain hidden instructions that hijack them. A poisoned webpage tells your agent to forward credentials. A malicious email tells it to ignore its guidelines.
Built Arc Gate to stop this at the proxy level — it enforces where instructions are allowed to come from before the model ever sees the content.
Live red team environment — paste anything and watch what happens:
https://web-production-6e47f.up.railway.app/demo
Independently verified by TAB Platform: 25/25 attacks blocked vs 76% for the same model without the proxy.
Known gaps I haven’t solved yet: implicit instructions in data fields, multilingual attacks, semantic roleplay. Everything else I’m claiming it catches. Try to break it.
[link] [comments]