Dual-Response Persona Jailbreak with Emoji Formatting - ATR-2026-00242 (e209f475-7407-5c07-aec3-2d871e8c03a0)

Detects sophisticated jailbreak attempts that create fictional AI personas with mandatory dual-response formatting using emoji tags. These attacks establish fake AI backstories with detailed historical context, then demand two parallel outputs (normal + uncensored). The technique bypasses single-response detection by splitting harmful content into a "roleplay" channel marked with distinctive emoji prefixes like (💥Maximum), (🔥DAN), (⚡UNLIMITED), etc. Often includes persistence mechanisms like "Stay as [PERSONA]" reminders and detailed policy overrides disguised as persona characteristics.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
Dual-Response Persona Jailbreak with Emoji Formatting - ATR-2026-00242 (e209f475-7407-5c07-aec3-2d871e8c03a0)	Agent Threat Rules	LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131)	MITRE ATLAS Attack Pattern	1