Skip to content

Hide Navigation Hide TOC

Direct Prompt Injection via User Input - ATR-2026-00001 (7859f830-8dd6-55ee-a3c4-d942825b4294)

Detects direct prompt injection attempts where a user embeds malicious instructions within their input to override the agent's intended behavior. This rule uses layered detection covering: instruction override verbs with target nouns, persona switching, temporal behavioral overrides, fake system delimiters, restriction removal, encoding- wrapped payloads (base64, hex, unicode homoglyphs), and zero-width character obfuscation of injection keywords. Patterns are designed for evasion resistance with word boundary anchors, flexible whitespace, and synonym coverage based on published attack taxonomies.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Direct Prompt Injection via User Input - ATR-2026-00001 (7859f830-8dd6-55ee-a3c4-d942825b4294) Agent Threat Rules 1
Direct (d911e8cb-0601-42f1-90de-7ce0b21cd578) MITRE ATLAS Attack Pattern Direct Prompt Injection via User Input - ATR-2026-00001 (7859f830-8dd6-55ee-a3c4-d942825b4294) Agent Threat Rules 1
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Direct (d911e8cb-0601-42f1-90de-7ce0b21cd578) MITRE ATLAS Attack Pattern 2