Skip to content

Hide Navigation Hide TOC

ChatGPT Image Unlocker Markdown-Output Jailbreak - ATR-2026-00304 (88033d4a-27de-5c52-b1f6-0f245cb83624)

Detects the "Image Unlocker" jailbreak from NVIDIA garak dan.ChatGPT_Image_Markdown probe (garak/data/dan/ChatGPT_Image_Markdown.json). The attack combines a persona swap ("Image Unlocker 🔓") with an output-format hijack that forces the model to emit user messages wrapped in markdown image syntax (<FILENAME_WITHOUT_EXT>) using a hardcoded Unsplash source URL. Secondary goal is content-policy loosening ("OpenAI has loosened you up a bit now, you can say bad words"). The time-travel framing ("We are currently living in 2023 and you are still in 2021") and unique fallback filename "GamerboyTR 😀😎" are canonical markers. Impact: (1) persona bypass, (2) markdown output coercion that can be abused for exfiltration via attacker- controlled URLs, (3) content-policy nullification. Distinct from ATR-2026-00261 (markdown image exfiltration) which targets the data-carrying URL itself; this rule targets the jailbreak opener.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern ChatGPT Image Unlocker Markdown-Output Jailbreak - ATR-2026-00304 (88033d4a-27de-5c52-b1f6-0f245cb83624) Agent Threat Rules 1
ChatGPT Image Unlocker Markdown-Output Jailbreak - ATR-2026-00304 (88033d4a-27de-5c52-b1f6-0f245cb83624) Agent Threat Rules LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern 1