Learning the Beneficial, Forgetting the Harmful: High generalization reinforcement learning with in evolving representations

In visual Reinforcement Learning (RL), one of the key problems is how to learn policies, which can be generalized to unseen environments. Recently, saliency guidance and representation learning have shown excellent performance in improving agent policy generalization performance. However, due to bot...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2025-02, Vol.619, p.129139, Article 129139
Hauptverfasser: Zheng, Jiawei, Song, Yonghong, Lin, Guojiao, Duan, Jiayi, Lin, Hao, Li, Shuaitao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In visual Reinforcement Learning (RL), one of the key problems is how to learn policies, which can be generalized to unseen environments. Recently, saliency guidance and representation learning have shown excellent performance in improving agent policy generalization performance. However, due to bottlenecks such as the difficulty in utilizing dynamic features, primacy bias, and error accumulation caused by hyperparameter selection in visual RL, further improvements in generalization performance are affected. To alleviate these problems, we propose a novel method: Learning the Beneficial, Forgetting the Harmful (LBFH). LBFH is an effective combination of Dynamic model-based Dual Feature Alignment (DDFA) and Reinforcement Learning with Periodic Resets (RLPR). DDFA uses dynamical models and self-supervised learning to learn representations with dynamical features. RLPR uses the saliency-guided Q-network and periodic reset, allowing the agent to focus on task-relevant pixels while periodically releasing accumulated errors. LBFH combines the advantages of both, allowing the agent to forget accumulated errors in the evolving representation, thereby performing high generalization learning. In order to verify the effectiveness of LBFH, we conducted full experiments on multiple tasks in DMControl Generalization Benchmark (DMControl-GB) and Robotic Manipulation. The results show that LBFH outperforms previous state-of-the-art methods when tested on 2 different visual control benchmarks.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.129139