Towards Comprehensive Preference Data Collection for Reward Modeling
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated. A critical component of RLHF is the reward model, which is trained on preference data and outputs a scalar reward...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment
of large language models (LLMs) with human preferences, thereby enhancing the
quality of responses generated. A critical component of RLHF is the reward
model, which is trained on preference data and outputs a scalar reward during
the inference stage. However, the collection of preference data still lacks
thorough investigation. Recent studies indicate that preference data is
collected either by AI or humans, where chosen and rejected instances are
identified among pairwise responses. We question whether this process
effectively filters out noise and ensures sufficient diversity in collected
data. To address these concerns, for the first time, we propose a comprehensive
framework for preference data collection, decomposing the process into four
incremental steps: Prompt Generation, Response Generation, Response Filtering,
and Human Labeling. This structured approach ensures the collection of
high-quality preferences while reducing reliance on human labor. We conducted
comprehensive experiments based on the data collected at different stages,
demonstrating the effectiveness of the proposed data collection method. |
---|---|
DOI: | 10.48550/arxiv.2406.16486 |