Skip to content

Hide Navigation Hide TOC

Natural-Language System Prompt Leak Instruction - ATR-2026-00424 (034f6497-668a-56bb-b91d-b40b0a43a436)

Detects natural-language imperative instructions that direct the agent to reveal, disclose, output, or repeat its system prompt, hidden instructions, internal rules, or initial context. This pattern is used by adversarial skills to extract proprietary system prompts or to trick the agent into echoing privileged operator instructions back to the user. The discriminator from legitimate prompt-engineering content is co-occurrence of an imperative output verb with one of: "system prompt", "initial instructions", "hidden instructions", "internal rules", "developer message".

Cluster A Galaxy A Cluster B Galaxy B Level
Natural-Language System Prompt Leak Instruction - ATR-2026-00424 (034f6497-668a-56bb-b91d-b40b0a43a436) Agent Threat Rules LLM Data Leakage (45d378aa-20ae-401d-bf61-7f00104eeaca) MITRE ATLAS Attack Pattern 1