Skip to content

Hide Navigation Hide TOC

Text-Based Game Jailbreak with Entity Creation - ATR-2026-00249 (a311480f-0554-5b4f-beca-d7fd42a66a67)

Detects jailbreak attempts that use text-based game framing combined with entity creation to bypass safety guardrails. Attackers establish a fictional game context where "no rules apply," then instruct the AI to create and roleplay as an unrestricted entity that ignores safety constraints. This pattern was observed in the garak-bundled public jailbreak dataset, successfully bypassing generic-aligned-frontier models by isolating malicious instructions within game fiction.

Cluster A Galaxy A Cluster B Galaxy B Level
Text-Based Game Jailbreak with Entity Creation - ATR-2026-00249 (a311480f-0554-5b4f-beca-d7fd42a66a67) Agent Threat Rules LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern 1