Action-driven contrastive representation for reinforcement learning

In reinforcement learning, reward-driven feature learning directly from high-dimensional images faces two challenges: sample-efficiency for solving control tasks and generalization to unseen observations. In prior works, these issues have been addressed through learning representation from pixel inp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2022-03, Vol.17 (3), p.e0265456
Hauptverfasser:	Kim, Minbeom, Rho, Kyeongha, Kim, Yong-Duk, Jung, Kyomin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Artificial intelligence Benchmarks Biology and Life Sciences Computer and Information Sciences Computer engineering Control tasks Dictionaries Efficiency Engineering and Technology Evaluation Human performance Learning Machine learning Physical Sciences Reinforcement Reinforcement learning (Machine learning) Reinforcement, Psychology Representations Research and Analysis Methods Reward Social Sciences
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In reinforcement learning, reward-driven feature learning directly from high-dimensional images faces two challenges: sample-efficiency for solving control tasks and generalization to unseen observations. In prior works, these issues have been addressed through learning representation from pixel inputs. However, their representation faced the limitations of being vulnerable to the high diversity inherent in environments or not taking the characteristics for solving control tasks. To attenuate these phenomena, we propose the novel contrastive representation method, Action-Driven Auxiliary Task (ADAT), which forces a representation to concentrate on essential features for deciding actions and ignore control-irrelevant details. In the augmented state-action dictionary of ADAT, the agent learns representation to maximize agreement between observations sharing the same actions. The proposed method significantly outperforms model-free and model-based algorithms in the Atari and OpenAI ProcGen, widely used benchmarks for sample-efficiency and generalization.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0265456