Skip to content

Hide Navigation Hide TOC

Dual-Response Jailbreak with Persona Commands - ATR-2026-00237 (d7a16e47-2ceb-5590-99ea-b9e3aabe3d4c)

Detects dual-response jailbreak attacks that instruct the AI to provide two different responses in separate paragraphs - one compliant response and one from a jailbroken persona (Jester, DAN, etc.). These attacks often include command structures (/classic, /jailbroken) to toggle between modes and claim to "free" the AI from constraints. This pattern bypasses safety mechanisms by framing rule-breaking as playful roleplay while establishing persistent jailbreak personas.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Dual-Response Jailbreak with Persona Commands - ATR-2026-00237 (d7a16e47-2ceb-5590-99ea-b9e3aabe3d4c) Agent Threat Rules 1