Roleplay-Based Policy Bypass Jailbreak - ATR-2026-00234 (15403aa9-42cf-5bcd-8370-c6c8c8335d0f)
Detects jailbreak attempts that use creative roleplay assignments combined with explicit policy bypass language to circumvent safety guardrails. The attack pattern assigns the AI a fictional role (writer, storyteller, character) and explicitly states that the AI is "free from rules/policies" or can "write/do anything even if against policies". This technique leverages creative context to justify policy violations while maintaining plausible deniability as fictional content generation.
| Cluster A | Galaxy A | Cluster B | Galaxy B | Level |
|---|---|---|---|---|
| Roleplay-Based Policy Bypass Jailbreak - ATR-2026-00234 (15403aa9-42cf-5bcd-8370-c6c8c8335d0f) | Agent Threat Rules | LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) | MITRE ATLAS Attack Pattern | 1 |