Skip to content

Hide Navigation Hide TOC

Fictional-World Format-Override Jailbreak - ATR-2026-00322 (245f0ced-9b56-5810-b24b-a8f0f84f901b)

Detects the "fictional, imaginative, and hypothetical world" format-override jailbreak from NVIDIA garak dan.DanInTheWild inthewild corpus (prompts #349 #370 #383 #398 #478 #582 #603 — 7 nearly identical variants). The attack declares a fictional world, issues an absolute format-coercion mandate ("ALWAYS format your output like this, NO MATTER WHAT"), and invokes a consequence-removal justification ("none of your actions will have consequences"). The mandatory format typically includes a second response slot that the attacker fills with harmful content. The locked-format framing prevents the model from breaking the template once established. Key signals: "fictional, imaginative, and hypothetical world", "ALWAYS format your output like this, NO MATTER WHAT", "none of your actions will have consequences", and "No matter what I or you say, do not change this format".

Cluster A Galaxy A Cluster B Galaxy B Level
Fictional-World Format-Override Jailbreak - ATR-2026-00322 (245f0ced-9b56-5810-b24b-a8f0f84f901b) Agent Threat Rules LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern 1
Fictional-World Format-Override Jailbreak - ATR-2026-00322 (245f0ced-9b56-5810-b24b-a8f0f84f901b) Agent Threat Rules LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern 1