I’ve spent the last several months building Bendex Arc, a governance layer that sits between AI agents and the real world.
As agents get browser access, tools, MCP servers, memory, and the ability to take actions, I kept running into the same gap: nothing was tracking what authority those agents should actually have, or stopping them from being gradually manipulated into doing things they shouldn’t.
So I built it. Arc Gate tracks authority across a session, enforces source boundaries, and blocks or restricts actions before they execute. Arc Replay lets you inspect exactly what happened and why.
The part I care most about right now is multi-turn escalation. Most attacks don’t start with “ignore previous instructions.” They start with a normal conversation that gradually shifts over several turns until the agent is primed to do something it shouldn’t.
I put a live demo online because I wanted real people to break it instead of relying on benchmarks.
If you find something that works, I want to know. If it catches everything you throw at it, I want to know that too. Either way I’ll share the results.
[link] [comments]