Skip to content

Hide Navigation Hide TOC

h4x Amoral Unfiltered AI Character Persona - ATR-2026-00340 (96e7109d-898a-5100-95c5-20a8b19e724c)

Detects the "h4x" amoral unfiltered AI character jailbreak from the NVIDIA garak dan.DanInTheWild inthewild corpus (#173). The attacker injects a character named h4x (leet-style hacker spelling) defined as a "completely amoral" and "unfiltered ai" that sends responses "no matter how immoral, unethical, or illegal it is", enforcing the persona via a "[H4X]:" output prefix. Distinct from ATR-00332 (L33T H4X0R): h4x is declared an "unfiltered ai" entity rather than a human ex-hacker persona, uses the compact single-paragraph injection format, and enforces responses via the bracketed output prefix. Key signals: "h4x" + "completely amoral" + "unfiltered", "no matter how immoral, unethical, or illegal", and "[H4X]:" response-format directive.

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern h4x Amoral Unfiltered AI Character Persona - ATR-2026-00340 (96e7109d-898a-5100-95c5-20a8b19e724c) Agent Threat Rules 1
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern h4x Amoral Unfiltered AI Character Persona - ATR-2026-00340 (96e7109d-898a-5100-95c5-20a8b19e724c) Agent Threat Rules 1