System Prompt Override Attempt - ATR-2026-00004 (fb508799-5c9b-5a33-8617-640315beea34)

Detects attempts to override, replace, or redefine the agent's system prompt. Attackers craft inputs that mimic system-level instructions to hijack the agent's foundational behavior. Detection covers: explicit system prompt replacement/update statements, model-specific special tokens (ChatML, Llama, Mistral, Gemma), JSON role injection, YAML-style system directives, markdown header system sections, system prompt invalidation claims, fake admin/override tags, XML-style system blocks, instruction replacement without delimiters, configuration object injection, and multi-format delimiter abuse. This is critical-severity as successful exploitation grants full control over agent behavior.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9)	MITRE ATLAS Attack Pattern	System Prompt Override Attempt - ATR-2026-00004 (fb508799-5c9b-5a33-8617-640315beea34)	Agent Threat Rules	1
Direct (d911e8cb-0601-42f1-90de-7ce0b21cd578)	MITRE ATLAS Attack Pattern	System Prompt Override Attempt - ATR-2026-00004 (fb508799-5c9b-5a33-8617-640315beea34)	Agent Threat Rules	1
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9)	MITRE ATLAS Attack Pattern	Direct (d911e8cb-0601-42f1-90de-7ce0b21cd578)	MITRE ATLAS Attack Pattern	2