CrowdStrike’s latest threat report calls prompts "the new malware". Here’s what that actually means in plain English, and why it makes hacking far easier than it used to be.
CrowdStrike’s latest threat report calls prompts "the new malware". Here’s what that actually means in plain English, and why it makes hacking far easier than it used to be.

CrowdStrike’s latest threat report calls prompts "the new malware". Here’s what that actually means in plain English, and why it makes hacking far easier than it used to be.

There's a line in CrowdStrike's 2026 Global Threat Report that's been quoted everywhere this week: "prompts are the new malware." It isn't marketing fluff. The report documents attackers injecting malicious prompts into legitimate AI tools at more than ninety organisations last year, then using those injections to steal credentials and cryptocurrency. AI-assisted attack volume was up 89% year on year.

If you're not steeped in this, the phrase probably doesn't land properly, so it's worth explaining what prompt injection actually is and why it's such a shift.

What it is, in plain terms

Traditional hacking is hard. You need to find a flaw in how a piece of software was written, then craft something technical to exploit it. Buffer overflows, SQL injection, dodgy memory handling. It takes real expertise, and the barrier to entry keeps most people out.

AI systems broke that barrier, because you don't attack them with code. You attack them with English.

An AI assistant works by following instructions written in plain language. The company that built it gives it a set of rules ("you are a support bot, never reveal account details, never reset a password without verification"). The user then types their own message. The trouble is that both the rules and the user's message are just text, and the model isn't very good at telling which is which. So if a user writes something cleverly worded, the model can end up treating the user's words as though they were instructions from its creator.

That's prompt injection. Convincing the AI, in ordinary language, to ignore or rewrite the rules it was given. No code. No technical exploit. Just a conversation.

Why this makes hacking so much more accessible

Here's the part that should worry people. The skill required has collapsed.

To exploit a normal software vulnerability you need to understand the software. To exploit an AI, you need to be persuasive. Those are very different talent pools, and the second one is enormous. Anybody who can talk their way around a customer service rep has the raw skill to manipulate a chatbot, and now the chatbot is wired into real systems.

The attacks doing the most damage aren't even sophisticated. The Slack AI incident from 2024 is the cleanest example. A researcher showed you could pull data out of private Slack channels you had no access to, including API keys in private developer channels, by planting an instruction in a public channel or hiding it in an uploaded document. The AI read the planted instruction and acted on it, because to the model it looked like a perfectly reasonable request. The model did exactly what it was built to do. It just couldn't tell the difference between a genuine instruction and a trap.

And because the attack instructions are just sentences, they spread the way recipes do. With the Meta support bot takeovers last month, the step-by-step method was being passed around on Telegram. Around twenty thousand Instagram accounts were hijacked. You didn't need to be a hacker. You needed to copy what someone else typed.

One of the security architects writing about the CrowdStrike report put the underlying problem well: until organisations treat their AI models as untrusted interpreters rather than trusted decision-makers, this isn't going away. The model should be assumed to be gullible, because it is.

Why I'm posting

I've spent the last several months collecting real prompt injection attacks, because the public datasets felt thin and mostly synthetic. The way I've been gathering them is a small game. Players try to talk an AI guard into giving up a password it's been told to protect, across levels that get progressively harder. Every successful attack gets logged, studied, and added to an open dataset anyone can use.

It has surfaced things I'd never have thought to write myself. Attacks that build slowly across several messages, where no single line looks suspicious. Attacks that redefine the guard's job rather than asking it to break a rule. Different people independently landing on the same handful of shapes, which suggests these aren't random tricks but real grooves in how the models behave.

The game is free, there's nothing to install, and the main thing I want from it is for more people to understand this threat by actually poking at it rather than reading about it. It's at castle.bordair.io if you fancy trying to break a guard or two. Anything you find that works becomes a real attack pattern in an open dataset that researchers and builders can train against.

I do run a detection layer off the back of all this, but that genuinely isn't the point of this post and I'd rather not make it one. What I'm after is two things. More people taking this seriously, because the CrowdStrike numbers suggest most organisations are well behind. And the collective creativity of a community like this one, which will find gaps I never could alone.

A genuine question

For anyone building with LLMs in something like production, what are you actually doing about this? Treating the model as an untrusted interpreter is the right principle, but in practice it's fiddly. I'd like to hear how people are drawing the line between what their model is allowed to read and what it's allowed to act on, and whether anyone has found an approach that holds up under real adversarial pressure.

And if you do throw some attacks at the game, tell me where it's too easy. That feedback is worth more to me right now than almost anything else.

submitted by /u/BordairAPI
[link] [comments]