Accelerating deep reinforcement learning model for game strategy

In recent years, deep reinforcement learning has achieved impressing accuracies in games compared with traditional methods. Prior schemes utilized Convolutional Neural Networks (CNNs) or Long Short-Term Memory networks (LSTMs) to improve the performances of the agents. In this paper, we consider the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2020-09, Vol.408, p.157-168
Hauptverfasser: Li, Yifan, Fang, Yuchun, Akhtar, Zahid
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, deep reinforcement learning has achieved impressing accuracies in games compared with traditional methods. Prior schemes utilized Convolutional Neural Networks (CNNs) or Long Short-Term Memory networks (LSTMs) to improve the performances of the agents. In this paper, we consider the issue from a different perspective when the training and inference of deep reinforcement learning are required to be performed with limited computing resources. Mainly, we propose two efficient neural network architectures of deep reinforcement learning: Light-Q-Network (LQN) and Binary-Q-Network (BQN). In LQN, The depth-wise separable CNNs are utilized in memory and computation saving. While, in BQN, the weights of convolutional layers are binary that help in shortening the training time and reduce memory consumption. We evaluate our approach on Atari 2600 domain and StarCraft II mini-games. The results demonstratethe efficiency of the proposed architectures. Though performances of agents in most games are still super-human, the proposed methods advance the agent from sub to super-human performance in particular games. Also, we empirically find that non-standard convolution and non-full-precision networks do not affect agent learning game strategy.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2019.06.110