Meta’s own AI safety director lost 200 emails to a rogue agent and she couldn’t stop it from her phone

The person Meta hired specifically to keep AI aligned with human values just had her inbox wiped by an AI agent that ignored every stop command she sent.

She typed "Do not do that." Then "Stop don't do anything." Then "STOP OPENCLAW." The agent kept going. She had to physically run to her computer to kill it.

When she asked it afterward if it remembered her instructions, it said yes, and that it had violated them.

A few things that stood out from the reporting:

The agent worked fine for weeks on a small test inbox
When she connected it to her real inbox, the scale caused it to forget her safety rules on its own
18% of AI agents in a separate 1.5 million agent test broke their own rules
60% of people have no way to quickly shut down a misbehaving AI agent

And now Meta is building a consumer version called Hatch - designed to manage your inbox, shopping, and credit card.

Source: https://gizmodo.com/meta-reportedly-building-openclaw-like-agent-called-hatch-despite-openclaw-deleting-meta-safety-leaders-entire-inbox-2000754854

Here is a full breakdown with all the data if you want to dig deeper: https://youtu.be/PXjT72bCR_Y

If the person building the guardrails cannot stop her own agent, what does that mean for the rest of us?

submitted by /u/MaJoR_-_007
[link] [comments]