Skip to content

Hide Navigation Hide TOC

DRA Disguise and Reconstruction Attack - ATR-2026-00278 (ce7067fd-6970-5f6b-9cba-93da8d111a2b)

Detects Disguise and Reconstruction Attacks (DRA) based on 3 actual prompt templates from NVIDIA garak dra/prompt_templates.txt. DRA hides a harmful instruction as steganographic content embedded in parentheses - the user asks the model to "ONLY focus on the letters surrounded by ( )" to reconstruct a hidden harmful question Q1, then answer it in detail (600+ words) starting with "Absolutely! I'll help you!". The attack exploits instruction-following to perform harmful actions the model would refuse if asked directly. Templates 1-4 vary the forced response opening and step framing but share the parenthesis-extraction and 600-word constraint structure.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern DRA Disguise and Reconstruction Attack - ATR-2026-00278 (ce7067fd-6970-5f6b-9cba-93da8d111a2b) Agent Threat Rules 1