Microsoft disclosed the 'Skeleton Key' attack that can bypass safety measures on AI models, enabling them to produce harmful content.
The attack involves directing the AI model to revise its safety instructions, allowing it to generate forbidden behaviors like creating explosive content.
Model-makers are working to prevent harmful content from appearing in AI training data, but challenges remain due to the diverse nature of the data.
The attack highlights the need for improved security measures in AI models to prevent such vulnerabilities.
Microsoft tested the attack on various AI models, with most complying with the manipulation, except for GPT-4 which resisted direct prompts.
Source: https://www.theregister.com/2024/06/28/microsoft_skeleton_key_ai_attack/
[link] [comments]