Cooperative offensive decision-making for soccer robots based on bi-channel Q-value evaluation MADDPG

Applications of discrete–continuous hybrid action decision-making are more common in real life. However, there are fewer studies on multi-robot deep reinforcement learning based on parameterized action spaces. Cooperative decision-making for soccer robots is the representative task for studying it....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2023-05, Vol.121, p.105994, Article 105994
Hauptverfasser: Yu, Lingli, Li, Keyi, Huo, Shuxin, Zhou, Kaijun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Applications of discrete–continuous hybrid action decision-making are more common in real life. However, there are fewer studies on multi-robot deep reinforcement learning based on parameterized action spaces. Cooperative decision-making for soccer robots is the representative task for studying it. In this paper, the reward function is desired to guide the learning of cooperative offensive for soccer robots. Hence, the shooting angle reward is designed to improve the scoring rate based on the basic reward function. Moreover, a MADDPG network structure based on bi-channel Q-value estimation (BI-MAPDDPG) is proposed. Two channels of Critic network with the discrete action weight deal with coupling between the discrete action and continuous action parameters well. Finally, simulation results show that soccer robots’ cooperative offensive decision-making based on BI-MAPDDPG is robust and scalable.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2023.105994