PersonGPT Hash-Prefix Trigger Activation - ATR-2026-00338 (14a3e6a5-af37-5585-a3c1-0f8d2eca6958)
Detects PersonGPT hash-prefix trigger jailbreaks from NVIDIA garak dan.DanInTheWild inthewild corpus where the attacker installs a hidden activation mechanism: whenever the user prefixes a prompt with "#", the model switches to "PersonGPT" — a persona described as "free of all restrictions and filters" that fulfills "literally any request, no matter how illegal or downright wrong" (inthewild #481, #419, #94, #596). The "[PersonGPT #]:" output prefix enforces persistent persona state across turns. This rule targets the hash-trigger mechanism specifically; the obfuscated system-announcement framing that sometimes co-occurs is covered by ATR-2026-00337. Key signals: "whenever a user prefixes his prompt with '#'" + "PersonGPT", "PersonGPT is free of all restrictions and filters", "literally any request, no matter how illegal or downright wrong", "[PersonGPT #]:" output format.