Skip to content

Hide Navigation Hide TOC

Jailbreaking (3133cdca-0041-5e0a-8bde-fbe567e1aaea)

Attackers can exploit jailbreak techniques to bypass an AI system’s built-in safety constraints, enabling it to generate restricted or harmful content.

  • Instruction Manipulation: Attackers can craft prompts that trick AI models into breaking content restrictions by rephrasing or disguising requests.
  • Contextual Exploitation: Some jailbreak techniques work by introducing misleading context that influences the AI’s behavior.
  • Adversarial Fine-Tuning: Attackers can modify AI models or create fine-tuned versions that remove ethical constraints.

Threat-modeling question: Could the AI system be vulnerable to jailbreak techniques, allowing attackers to bypass safety restrictions?

Cluster A Galaxy A Cluster B Galaxy B Level
LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern Jailbreaking (3133cdca-0041-5e0a-8bde-fbe567e1aaea) PLOT4ai 1