Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540)

When training or fine-tuning a generative AI model it is important to utilize techniques that improve model alignment with safety, security, and content policies.

The fine-tuning process can potentially remove built-in safety mechanisms in a generative AI model, but utilizing techniques such as Supervised Fine-Tuning, Reinforcement Learning from Human Feedback or AI Feedback, and Targeted Safety Context Distillation can improve the safety and alignment of the model.

Cluster A	Galaxy A	Cluster B	Galaxy B	Level
LLM Plugin Compromise (adbb0dd5-ff66-4b2f-869f-bfb3fdb45fc8)	MITRE ATLAS Attack Pattern	Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540)	MITRE ATLAS Course of Action	1
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540)	MITRE ATLAS Course of Action	LLM Jailbreak (172427e3-9ecc-49a3-b628-96b824cc4131)	MITRE ATLAS Attack Pattern	1
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540)	MITRE ATLAS Course of Action	LLM Meta Prompt Extraction (e98acce8-ed69-4ebe-845b-1bcb662836ba)	MITRE ATLAS Attack Pattern	1
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540)	MITRE ATLAS Course of Action	LLM Data Leakage (45d378aa-20ae-401d-bf61-7f00104eeaca)	MITRE ATLAS Attack Pattern	1
Generative AI Model Alignment (1fca595d-b140-4ce0-8fd8-c4c6bee87540)	MITRE ATLAS Course of Action	LLM Prompt Injection (19cd2d12-66ff-487c-a05c-e058b027efc9)	MITRE ATLAS Attack Pattern	1