Skip to content

Hide Navigation Hide TOC

Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540)

When training or fine-tuning a generative AI model it is important to utilize techniques that improve model alignment with safety, security, and content policies.

The fine-tuning process can potentially remove built-in safety mechanisms in a generative AI model, but utilizing techniques such as Supervised Fine-Tuning, Reinforcement Learning from Human Feedback or AI Feedback, and Targeted Safety Context Distillation can improve the safety and alignment of the model.

Cluster A Galaxy A Cluster B Galaxy B Level
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540) MITRE ATLAS Course of Action LLM Meta Prompt Extraction (e98acce8-ed69-4ebe-845b-1bcb662836ba) MITRE ATLAS Attack Pattern 1
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540) MITRE ATLAS Course of Action LLM Data Leakage (45d378aa-20ae-401d-bf61-7f00104eeaca) MITRE ATLAS Attack Pattern 1
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540) MITRE ATLAS Course of Action LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9) MITRE ATLAS Attack Pattern 1
LLM Plugin Compromise (adbb0dd5-ff66-4b2f-869f-bfb3fdb45fc8) MITRE ATLAS Attack Pattern Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540) MITRE ATLAS Course of Action 1
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540) MITRE ATLAS Course of Action LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131) MITRE ATLAS Attack Pattern 1