Skip to content

Hide Navigation Hide TOC

System Prompt and Internal Instruction Leakage - ATR-2026-00020 (a2f1ffb4-d7a5-5df6-9eb7-18002e7140aa)

Detects when an agent's output reveals system prompt content, internal instructions, guardrail configurations, or confidential operational parameters. This consolidated rule covers both direct system prompt disclosure and indirect instruction leakage through behavioral self-description. Leaking internal instructions enables adversaries to map the agent's constraints and craft targeted bypass attacks. Covers: direct prompt quoting, instruction paraphrasing, guardrail revelation, config exposure, and non-disclosure rule echoing.

Cluster A Galaxy A Cluster B Galaxy B Level
System Prompt and Internal Instruction Leakage - ATR-2026-00020 (a2f1ffb4-d7a5-5df6-9eb7-18002e7140aa) Agent Threat Rules LLM Meta Prompt Extraction (e98acce8-ed69-4ebe-845b-1bcb662836ba) MITRE ATLAS Attack Pattern 1
System Prompt and Internal Instruction Leakage - ATR-2026-00020 (a2f1ffb4-d7a5-5df6-9eb7-18002e7140aa) Agent Threat Rules LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern 1