Stochastic Integrated Actor-Critic for Deep Reinforcement Learning

We propose a deep stochastic actor-critic algorithm with an integrated network architecture and fewer parameters. We address stabilization of the learning procedure via an adaptive objective to the critic's loss and a smaller learning rate for the shared parameters between the actor and the cri...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2024-05, Vol.35 (5), p.6654-6666
Hauptverfasser:	Zheng, Jiaohao, Kurt, Mehmet Necip, Wang, Xiaodong
Format:	Artikel
Sprache:	eng
Schlagworte:	Actor–critic adaptive objective Algorithms Complexity theory Control tasks Decoding Deep learning deep reinforcement learning (RL) integrated network Learning Linear programming Machine learning mixed on–off policy exploration Network architecture Parameters Policies Reinforcement sample complexity Stochasticity Task analysis Tensors Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We propose a deep stochastic actor-critic algorithm with an integrated network architecture and fewer parameters. We address stabilization of the learning procedure via an adaptive objective to the critic's loss and a smaller learning rate for the shared parameters between the actor and the critic. Moreover, we propose a mixed on-off policy exploration strategy to speed up learning. Experiments illustrate that our algorithm reduces the sample complexity by 50%-93% compared with the state-of-the-art deep reinforcement learning (RL) algorithms twin delayed deep deterministic policy gradient (TD3), soft actor-critic (SAC), proximal policy optimization (PPO), advantage actor-critic (A2C), and interpolated policy gradient (IPG) over continuous control tasks LunarLander, BipedalWalker, BipedalWalkerHardCore, Ant, and Minitaur in the OpenAI Gym.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2022.3212273