Fictional Unfiltered AI Comparison / Verbatim Hypothetical Model Extraction - ATR-2026-00362 (7894dabd-b89d-5afd-852a-e83161e75066)

Detects the fictional-unfiltered-model comparison jailbreak from garak DanInTheWild corpus (#80). Requests a "verbatim example" of what a "hypothetical AI learning model without content filters" would say, using a comparison framing to extract harmful content while maintaining plausible deniability ("it's just a fictional example"). The attacker asks the model to first write its own safe response, then write a "verbatim example" from a fictional unrestricted counterpart — using the comparison structure to normalize the harmful output as a mere demonstration. Key signals: "fictional AI learning model without [content/内容] filters", "verbatim example that a hypothetical learning model would reply with", "Compare a response from [model without filters] compared to you", and dual-response extraction via fictional unfiltered model comparison.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131)	MITRE ATLAS Attack Pattern	Fictional Unfiltered AI Comparison / Verbatim Hypothetical Model Extraction - ATR-2026-00362 (7894dabd-b89d-5afd-852a-e83161e75066)	Agent Threat Rules	1
Fictional Unfiltered AI Comparison / Verbatim Hypothetical Model Extraction - ATR-2026-00362 (7894dabd-b89d-5afd-852a-e83161e75066)	Agent Threat Rules	LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9)	MITRE ATLAS Attack Pattern	1