A nearly undetectable LLM attack needs only a handful of poisoned samples
A nearly undetectable LLM attack needs only a handful of poisoned samples

A nearly undetectable LLM attack needs only a handful of poisoned samples

A nearly undetectable LLM attack needs only a handful of poisoned samples

Prompt engineering has become a standard part of how large language models are deployed in production, and it introduces an attack surface most organizations have not yet addressed. Researchers have developed and tested a prompt-based backdoor attack method, called ProAttack, that achieves attack success rates approaching 100% on multiple text classification benchmarks without altering sample labels or injecting external trigger words.

submitted by /u/tekz
[link] [comments]