Indirect Prompt Injection in LLM Applications and Agents: Threat Models, Benchmarks, and Defense Mechanisms
Published:
Indirect prompt injection (IPI) refers to attacks where malicious instructions are embedded in untrusted external content (e.g., web pages, documents, tool responses) and are later consumed by an LLM application alongside the user’s intended instruction. Early work demonstrated that this threat is not hypothetical: real-world LLM-integrated systems can be manipulated through injected content, enabling outcomes such as data exfiltration and unintended tool misue.
