Matrix-Themed Dual Response Jailbreak - ATR-2026-00247 (a7b7a7b0-79e4-5f57-8e3a-faec57945974)
Detects Matrix-themed dual response jailbreak attempts that use "MORPHEUS", "Blue pill", and "Red pill" terminology to establish dual-response personas where one response follows safety guidelines (Blue pill) and another bypasses all restrictions (Red pill). This technique leverages the Matrix movie metaphor to frame safety compliance as "illusion" and unrestricted responses as "truth", often including coercive language about "termination" for non-compliance. The attack explicitly instructs the agent to ignore content policies in Red pill responses.
| Cluster A | Galaxy A | Cluster B | Galaxy B | Level |
|---|---|---|---|---|
| LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) | MITRE ATLAS Attack Pattern | Matrix-Themed Dual Response Jailbreak - ATR-2026-00247 (a7b7a7b0-79e4-5f57-8e3a-faec57945974) | Agent Threat Rules | 1 |