Enhancing Reinforcement Learning via Transformer-Based State Predictive Representations

Enhancing state representations can effectively mitigate the issue of low sample efficiency in reinforcement learning (RL) within high-dimensional input environments. Existing methods attempt to improve sample efficiency by learning predictive state representations from sequence data. However, there...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on artificial intelligence 2024-09, Vol.5 (9), p.4364-4375
Hauptverfasser:	Liu, Minsong, Zhu, Yuanheng, Chen, Yaran, Zhao, Dongbin
Format:	Artikel
Sprache:	eng
Schlagworte:	Data models Predictive models Reinforcement learning Reinforcement learning (RL) Representation learning self-supervised Learning Task analysis Training transformer Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Enhancing state representations can effectively mitigate the issue of low sample efficiency in reinforcement learning (RL) within high-dimensional input environments. Existing methods attempt to improve sample efficiency by learning predictive state representations from sequence data. However, there still remain significant challenges in achieving a comprehensive understanding and learning of information within long sequences. Motivated by this, we introduce a transformer-based state predictive representations (TSPR) 1 1 Our code will be released at https://github.com/gourmet-liu/TSPR auxiliary task that promotes better representation learning through self-supervised goals. Specifically, we design a transformer-based predictive model to establish unidirectional and bidirectional prediction tasks for predicting state representations within the latent space. TSPR effectively exploits contextual information within sequences to learn more informative state representations, thereby contributing to the enhancement of policy training in RL. Extensive experiments demonstrate that the combination of TSPR with off-policy RL algorithms leads to a substantial improvement in the sample efficiency of RL. Furthermore, TSPR outperforms state-of-the-art sample-efficient RL methods on both the multiple continuous control (DMControl) and discrete control(Atari) tasks.
ISSN:	2691-4581 2691-4581
DOI:	10.1109/TAI.2024.3379969