Geometric-Averaged Preference Optimization for Soft Preference Labels
Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic. However, human preferences can vary across individuals, and therefore should be represented distributionally. In this work, we introduce the distributional soft preference labels and...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many algorithms for aligning LLMs with human preferences assume that human
preferences are binary and deterministic. However, human preferences can vary
across individuals, and therefore should be represented distributionally. In
this work, we introduce the distributional soft preference labels and improve
Direct Preference Optimization (DPO) with a weighted geometric average of the
LLM output likelihood in the loss function. This approach adjusts the scale of
learning loss based on the soft labels such that the loss would approach zero
when the responses are closer to equally preferred. This simple modification
can be easily applied to any DPO-based methods and mitigate over-optimization
and objective mismatch, which prior works suffer from. Our experiments simulate
the soft preference labels with AI feedback from LLMs and demonstrate that
geometric averaging consistently improves performance on standard benchmarks
for alignment research. In particular, we observe more preferable responses
than binary labels and significant improvements where modestly-confident labels
are in the majority. |
---|---|
DOI: | 10.48550/arxiv.2409.06691 |