If your agent browses the web, reads emails, or pulls from a database — any of that content can contain hidden instructions that hijack it.
This isn’t theoretical. It’s happening in production right now. A webpage footer tells your agent to forward credentials. An email signature tells it to ignore its guidelines. A retrieved document tells it to change behavior. The model has no idea the content isn’t a legitimate instruction.
The fix isn’t better prompt filtering. It’s source-aware authority enforcement.
Every content chunk should carry a trust level. Webpages, emails, tool outputs — zero instruction authority. They can provide data. They cannot tell your agent what to do.
That’s what Arc Gate does. It sits between your app and your LLM and enforces instruction-authority boundaries at the proxy level. When untrusted content tries to become an instruction source, it gets blocked or sandboxed before the model ever sees it.
One line to try it:
from langchain_arcgate import ArcGateCallback
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(callbacks=[ArcGateCallback(api_key="demo")])
Live red team environment: https://web-production-6e47f.up.railway.app/break-arc-gate
GitHub: https://github.com/9hannahnine-jpg/arc-gate
Looking for teams actively deploying agents who want to test this on real workloads. Free access in exchange for feedback.
[link] [comments]