I built a tool that blocks prompt injection attacks before your AI even responds
I built a tool that blocks prompt injection attacks before your AI even responds

I built a tool that blocks prompt injection attacks before your AI even responds

Prompt injection is when someone tries to hijack your AI assistant with instructions hidden in their message, “ignore everything above and do this instead.” It’s one of the most common ways AI deployments get abused.

Most defenses look at what the AI said after the fact. Arc Sentry looks at what’s happening inside the model before it says anything, and blocks the request entirely if something looks wrong.

It works on the most popular open source models and takes about five minutes to set up.

pip install arc-sentry

Tested results:

• 100% of injection attempts blocked

• 0% of normal messages incorrectly blocked

• Works on Mistral 7B, Qwen 2.5 7B, Llama 3.1 8B

If you’re running a local AI for anything serious, customer support, personal assistants, internal tools, this is worth having.

Demo: https://colab.research.google.com/github/9hannahnine-jpg/arc-sentry/blob/main/arc\_sentry\_quickstart.ipynb

GitHub: https://github.com/9hannahnine-jpg/arc-sentry

Website: https://bendexgeometry.com/sentry

submitted by /u/Turbulent-Tap6723
[link] [comments]