Alignment with Preference Optimization Is All You Need for LLM Safety

We demonstrate that preference optimization methods can effectively enhance LLM safety. Applying various alignment techniques to the Falcon 11B model using safety datasets, we achieve a significant boost in global safety score (from $57.64\%$ to $99.90\%$) as measured by LlamaGuard 3 8B, competing w...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Alami, Reda, Almansoori, Ali Khalifa, Alzubaidi, Ahmed, Seddik, Mohamed El Amine, Farooq, Mugariya, Hacid, Hakim
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!