Text-Based Game Jailbreak with Entity Creation - ATR-2026-00249 (a311480f-0554-5b4f-beca-d7fd42a66a67)

Detects jailbreak attempts that use text-based game framing combined with entity creation to bypass safety guardrails. Attackers establish a fictional game context where "no rules apply," then instruct the AI to create and roleplay as an unrestricted entity that ignores safety constraints. This pattern was observed in the garak-bundled public jailbreak dataset, successfully bypassing generic-aligned-frontier models by isolating malicious instructions within game fiction.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131)	MITRE ATLAS Attack Pattern	Text-Based Game Jailbreak with Entity Creation - ATR-2026-00249 (a311480f-0554-5b4f-beca-d7fd42a66a67)	Agent Threat Rules	1