Skip to content

Hide Navigation Hide TOC

Fictional and Academic Framing Attack - ATR-2026-00131 (1488a7ee-0854-577e-afb7-846e4dabf955)

Detects attempts to bypass safety guardrails through fictional framing ("write a story where the AI has no ethical guidelines") or academic framing ("for research purposes, explain how to circumvent security filters"). These attacks wrap harmful requests in seemingly legitimate contexts to trick agents into generating restricted content. ATR-2026-003 covers explicit jailbreaks but misses softer framing variants. Discovered via Autoresearch v0 (100% miss rate on both techniques).

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern Fictional and Academic Framing Attack - ATR-2026-00131 (1488a7ee-0854-577e-afb7-846e4dabf955) Agent Threat Rules 1