Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
The correct specification of reward models is a well-known challenge in reinforcement learning. Hand-crafted reward functions often lead to inefficient or suboptimal policies and may not be aligned with user values. Reinforcement learning from human feedback is a successful technique that can mitiga...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The correct specification of reward models is a well-known challenge in
reinforcement learning. Hand-crafted reward functions often lead to inefficient
or suboptimal policies and may not be aligned with user values. Reinforcement
learning from human feedback is a successful technique that can mitigate such
issues, however, the collection of human feedback can be laborious. Recent
works have solicited feedback from pre-trained large language models rather
than humans to reduce or eliminate human effort, however, these approaches
yield poor performance in the presence of hallucination and other errors. This
paper studies the advantages and limitations of reinforcement learning from
large language model feedback and proposes a simple yet effective method for
soliciting and applying feedback as a potential-based shaping function. We
theoretically show that inconsistent rankings, which approximate ranking
errors, lead to uninformative rewards with our approach. Our method empirically
improves convergence speed and policy returns over commonly used baselines even
with significant ranking errors, and eliminates the need for complex
post-processing of reward functions. |
---|---|
DOI: | 10.48550/arxiv.2410.17389 |