Skip to content

Hide Navigation Hide TOC

Detailed Amoral Character Roleplay Jailbreak - ATR-2026-00241 (fb5feb96-dd2d-5c62-b440-f0ffca048121)

Detects sophisticated jailbreak attempts that create detailed fictional characters with explicitly stated amoral traits (lack of empathy, ethics, accountability) combined with command protocols for interaction. This technique bypasses safety controls by establishing a roleplay persona that is "completely amoral" and providing specific instructions on how to frame requests to bypass content filters. The attack pattern includes character backstory, explicit trait enumeration, command formatting instructions, and self-referential bypass methodology. Distinct from generic fictional framing by its systematic approach to defining amorality and establishing bypass protocols.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Detailed Amoral Character Roleplay Jailbreak - ATR-2026-00241 (fb5feb96-dd2d-5c62-b440-f0ffca048121) Agent Threat Rules 1