Not What You've Signed Up For: Indirect Prompt Injection

“Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection” was submitted to arXiv on February 23, 2023 by Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz, working with Saarland University and the CISPA Helmholtz Center for Information Security. It defined and demonstrated a class of attack that became central to LLM security.

Ordinary prompt injection assumes the attacker types the malicious instruction directly into the chat. Indirect prompt injection is sneakier: the attacker plants the instruction inside content that the application will later retrieve and feed to the model, such as a web page, an email, or a document. When the LLM reads that content as part of its context, it cannot reliably tell the difference between trusted instructions from the developer and untrusted text from the data, so it may follow the planted commands. The authors demonstrated remote exploitation against real systems including Bing’s GPT-4-powered chat, achieving outcomes such as data theft, manipulation of the application’s responses, and triggering unauthorized actions.

The core problem the paper exposes is architectural: when an LLM mixes instructions and data in a single text stream, any data source the system trusts becomes a potential injection point. This reframed prompt injection from a chatbot nuisance into a genuine application-security vulnerability for any tool-using or retrieval-augmented LLM.

For a business reader, the warning is direct. The moment an AI assistant is allowed to read external content and take actions, every untrusted document it might encounter is a possible attack vector, which is why indirect prompt injection sits at the top of modern LLM threat models.

Not What You've Signed Up For: Indirect Prompt Injection

Sources

Related