I put my AI agent governance platform online. Try to break it.

I’ve spent the last several months building Bendex Arc, a governance layer that sits between AI agents and the real world.

As agents get browser access, tools, MCP servers, memory, and the ability to take actions, I kept running into the same gap: nothing was tracking what authority those agents should actually have, or stopping them from being gradually manipulated into doing things they shouldn’t.

So I built it. Arc Gate tracks authority across a session, enforces source boundaries, and blocks or restricts actions before they execute. Arc Replay lets you inspect exactly what happened and why.

The part I care most about right now is multi-turn escalation. Most attacks don’t start with “ignore previous instructions.” They start with a normal conversation that gradually shifts over several turns until the agent is primed to do something it shouldn’t.

I put a live demo online because I wanted real people to break it instead of relying on benchmarks.

If you find something that works, I want to know. If it catches everything you throw at it, I want to know that too. Either way I’ll share the results.

Demo: https://web-production-6e47f.up.railway.app/demo

GitHub: https://github.com/9hannahnine-jpg/arc-gate

submitted by /u/Turbulent-Tap6723
[link] [comments]