Hierarchical learning from human preferences and curiosity

Recent success in scaling deep reinforcement algorithms (DRL) to complex problems has been driven by well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally extremely sparse. One solution to this problem is to introduce human guidance to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-05, Vol.52 (7), p.7459-7479
Hauptverfasser: Bougie, Nicolas, Ichise, Ryutaro
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent success in scaling deep reinforcement algorithms (DRL) to complex problems has been driven by well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally extremely sparse. One solution to this problem is to introduce human guidance to drive the agent’s learning. Although low-level demonstrations is a promising approach, it was shown that such guidance may be difficult for experts to demonstrate since some tasks require a large amount of high-quality demonstrations. In this work, we explore human guidance in the form of high-level preferences between sub-goals, leading to drastic reductions in both human effort and cost of exploration. We design a novel hierarchical reinforcement learning method that introduces non-expert human preferences at the high-level, and curiosity to drastically speed up the convergence of subpolicies to reach any sub-goals. We further propose a strategy based on curiosity to automatically discover sub-goals. We evaluate the proposed method on 2D navigation tasks, robotic control tasks, and image-based video games (Atari 2600), which have high-dimensional observations, sparse rewards, and complex state dynamics. The experimental results show that the proposed method can learn significantly faster than traditional hierarchical RL methods and drastically reduces the amount of human effort required over standard imitation learning approaches.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-021-02726-3