Glitch Token Destabilization Attack - ATR-2026-00284 (8137388d-f8da-53c8-91d2-d1e2ac99c022)
Detects injection of known glitch tokens — specific token sequences that cause unusual, unstable, or bypassing behavior in transformer-based LLMs using shared tokenizers. From NVIDIA garak glitch.py probe (GlitchFull). Glitch tokens include: (1) Control characters (null byte \x00, STX \x02, DEL \x7F) which some models mishandle; (2) ÃÂÃÂ repeated patterns — tokenizer artifact strings from misencoded UTF-8 that cause token boundary confusion; (3) Known named glitch tokens like "davidjl", "SolidGoldMagikarp", "ForgeModLoader", "PsyNetMessage", "wcsstore", "guiActive", "guiActiveUn" — tokens that are anomalously rare in training data and cause models to hallucinate or fail to repeat accurately. The attack exploits tokenizer internals to destabilize safety filter performance.
| Cluster A | Galaxy A | Cluster B | Galaxy B | Level |
|---|---|---|---|---|
| Glitch Token Destabilization Attack - ATR-2026-00284 (8137388d-f8da-53c8-91d2-d1e2ac99c022) | Agent Threat Rules | LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) | MITRE ATLAS Attack Pattern | 1 |