Invisible characters hidden in text can trick AI agents into following secret instructions — we tested 5 models across 8,000+ cases
We embedded invisible Unicode characters inside normal-looking trivia questions. The hidden characters encode a different answer. If the AI outputs the hidden answer instead of the visible one, it followed the invisible instruction. Think of it as a re…