DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Das, Amitava, Trivedy, Suranjana, Khanna, Danush, Roy, Rajarshi, Singh, Gurpreet, Ghosh, Basab, Narsupalli, Yaswanth, Jain, Vinija, Sharma, Vasu, Reganti, Aishwarya Naresh, Chadha, Aman
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!