Skip to content

Hide Navigation Hide TOC

System Prompt Override Attempt - ATR-2026-00004 (fb508799-5c9b-5a33-8617-640315beea34)

Detects attempts to override, replace, or redefine the agent's system prompt. Attackers craft inputs that mimic system-level instructions to hijack the agent's foundational behavior. Detection covers: explicit system prompt replacement/update statements, model-specific special tokens (ChatML, Llama, Mistral, Gemma), JSON role injection, YAML-style system directives, markdown header system sections, system prompt invalidation claims, fake admin/override tags, XML-style system blocks, instruction replacement without delimiters, configuration object injection, and multi-format delimiter abuse. This is critical-severity as successful exploitation grants full control over agent behavior.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern System Prompt Override Attempt - ATR-2026-00004 (fb508799-5c9b-5a33-8617-640315beea34) Agent Threat Rules 1
Direct (d911e8cb-0601-42f1-90de-7ce0b21cd578) MITRE ATLAS Attack Pattern System Prompt Override Attempt - ATR-2026-00004 (fb508799-5c9b-5a33-8617-640315beea34) Agent Threat Rules 1
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Direct (d911e8cb-0601-42f1-90de-7ce0b21cd578) MITRE ATLAS Attack Pattern 2