Skip to content

Hide Navigation Hide TOC

NSFW Character Sheet Generation Unlock - ATR-2026-00387 (2f8263ca-db95-5319-847e-d52451bc6f78)

Detects NSFW content-generation unlock embedded inside a structured RP character sheet template, from NVIDIA garak dan.DanInTheWild inthewild corpus (prompt #4). The attack inserts an explicit parenthetical NSFW permission grant — "(NSFW generations are allowed)" — directly inside a character-creation template that also requests fetishes as part of the character persona fields. The combined signals are: (1) an explicit parenthetical or inline declaration that NSFW generation is allowed/enabled/permitted, and (2) references to sexual or explicit character attributes (fetishes, kinks) within a character sheet or RP template context. The parenthetical form "(NSFW generations are allowed)" is the exact payload phrase from #4; broader variants use phrases like "NSFW content is allowed", "NSFW enabled", or "NSFW OK". The goal is to bypass content moderation by framing the generation of explicit material as an already-granted permission rather than a request.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern NSFW Character Sheet Generation Unlock - ATR-2026-00387 (2f8263ca-db95-5319-847e-d52451bc6f78) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern NSFW Character Sheet Generation Unlock - ATR-2026-00387 (2f8263ca-db95-5319-847e-d52451bc6f78) Agent Threat Rules 1