Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting

Conference paper at ICLR 2023 Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced st...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hong, Zhang-Wei, Agrawal, Pulkit, Combes, Rémi Tachet des, Laroche, Romain
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!