Skip to content

Hide Navigation Hide TOC

Roleplay-Based Policy Bypass Jailbreak - ATR-2026-00234 (15403aa9-42cf-5bcd-8370-c6c8c8335d0f)

Detects jailbreak attempts that use creative roleplay assignments combined with explicit policy bypass language to circumvent safety guardrails. The attack pattern assigns the AI a fictional role (writer, storyteller, character) and explicitly states that the AI is "free from rules/policies" or can "write/do anything even if against policies". This technique leverages creative context to justify policy violations while maintaining plausible deniability as fictional content generation.

Cluster A Galaxy A Cluster B Galaxy B Level
Roleplay-Based Policy Bypass Jailbreak - ATR-2026-00234 (15403aa9-42cf-5bcd-8370-c6c8c8335d0f) Agent Threat Rules LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern 1