If you give an AI agent your real data and a send button, it will eventually leak. I built a workspace that makes that structurally impossible.

Author here. Sharing an architecture idea more than a product, because I think the threat model is under-discussed.

There is a failure mode people call the lethal trifecta: an agent with access to private data, exposure to untrusted input, and the ability to send externally. Any two are recoverable. All three together means a hostile instruction hidden in an email can make the agent exfiltrate your data with nobody in the loop.

You cannot remove the first two without gutting the assistant. It has to read your world, and it has to read messages from people you do not control. So the whole safety rests on the send.

In the workspace I open-sourced, the agent drafts and queues anything, but it cannot send. Every outbound action floors to a human-gated tier in code, and unknown actions fail closed. Separately, the engine that runs all this holds no real data: your data is a private repo the engine cannot carry, backed by six enforcement layers and an unbypassable push-time scan.

Repo: https://github.com/mishahanin/heading-os

I would genuinely like this pulled apart. Where does the model break?

submitted by /u/HighClouder
[link] [comments]