Imitating Language via Scalable Inverse Reinforcement Learning

The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum likelihood estimation (MLE) for next token predict...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wulfmeier, Markus, Bloesch, Michael, Vieillard, Nino, Ahuja, Arun, Bornschein, Jorg, Huang, Sandy, Sokolov, Artem, Barnes, Matt, Desjardins, Guillaume, Bewley, Alex, Bechtle, Sarah Maria Elisabeth, Springenberg, Jost Tobias, Momchev, Nikola, Bachem, Olivier, Geist, Matthieu, Riedmiller, Martin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!