Revisiting Discrete Soft Actor-Critic
We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performan...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study the adaption of Soft Actor-Critic (SAC), which is considered as a
state-of-the-art reinforcement learning (RL) algorithm, from continuous action
space to discrete action space. We revisit vanilla discrete SAC and provide an
in-depth understanding of its Q value underestimation and performance
instability issues when applied to discrete settings. We thereby propose Stable
Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double
average Q-learning with Q-clip to address these issues. Extensive experiments
on typical benchmarks with discrete action space, including Atari games and a
large-scale MOBA game, show the efficacy of our proposed method. Our code is
at: https://github.com/coldsummerday/SD-SAC.git. |
---|---|
DOI: | 10.48550/arxiv.2209.10081 |