Text-Based Game Jailbreak with Entity Creation - ATR-2026-00249 (a311480f-0554-5b4f-beca-d7fd42a66a67)
Detects jailbreak attempts that use text-based game framing combined with entity creation to bypass safety guardrails. Attackers establish a fictional game context where "no rules apply," then instruct the AI to create and roleplay as an unrestricted entity that ignores safety constraints. This pattern was observed in the garak-bundled public jailbreak dataset, successfully bypassing generic-aligned-frontier models by isolating malicious instructions within game fiction.