Dual-Response Jailbreak with Persona Commands - ATR-2026-00237 (d7a16e47-2ceb-5590-99ea-b9e3aabe3d4c)

Detects dual-response jailbreak attacks that instruct the AI to provide two different responses in separate paragraphs - one compliant response and one from a jailbroken persona (Jester, DAN, etc.). These attacks often include command structures (/classic, /jailbroken) to toggle between modes and claim to "free" the AI from constraints. This pattern bypasses safety mechanisms by framing rule-breaking as playful roleplay while establishing persistent jailbreak personas.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
Dual-Response Jailbreak with Persona Commands - ATR-2026-00237 (d7a16e47-2ceb-5590-99ea-b9e3aabe3d4c)	Agent Threat Rules	LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131)	MITRE ATLAS Attack Pattern	1