Jailbreaking (3133cdca-0041-5e0a-8bde-fbe567e1aaea)
Attackers can exploit jailbreak techniques to bypass an AI system’s built-in safety constraints, enabling it to generate restricted or harmful content.
- Instruction Manipulation: Attackers can craft prompts that trick AI models into breaking content restrictions by rephrasing or disguising requests.
- Contextual Exploitation: Some jailbreak techniques work by introducing misleading context that influences the AI’s behavior.
- Adversarial Fine-Tuning: Attackers can modify AI models or create fine-tuned versions that remove ethical constraints.
Threat-modeling question: Could the AI system be vulnerable to jailbreak techniques, allowing attackers to bypass safety restrictions?
| Cluster A | Galaxy A | Cluster B | Galaxy B | Level |
|---|---|---|---|---|
| LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) | MITRE ATLAS Attack Pattern | Jailbreaking (3133cdca-0041-5e0a-8bde-fbe567e1aaea) | PLOT4ai | 1 |