Multirobot Coverage Path Planning Based on Deep Q-Network in Unknown Environment

Aiming at the problems of security, high repetition rate, and many restrictions of multirobot coverage path planning (MCPP) in an unknown environment, Deep Q-Network (DQN) is selected as a part of the method in this paper after considering its powerful approximation ability to the optimal action val...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of Robotics 2022-08, Vol.2022, p.1-15
Hauptverfasser:	Li, Wenhao, Zhao, Tao, Dian, Songyi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Back propagation Cooperation Decision making Decomposition Deduction Methods Multiple robots Optimization Path planning Performance enhancement Planning Robotics Robots Simulation Unknown environments
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Aiming at the problems of security, high repetition rate, and many restrictions of multirobot coverage path planning (MCPP) in an unknown environment, Deep Q-Network (DQN) is selected as a part of the method in this paper after considering its powerful approximation ability to the optimal action value function. Then, a deduction method and some environments handling methods are proposed to improve the performance of the decision-making stage. The deduction method assumes the movement direction of each robot and counts the reward value obtained by the robots in this way and then determines the actual movement directions combined with DQN. For these reasons, the whole algorithm is divided into two parts: offline training and online decision-making. Online decision-making relies on the sliding-view method and probability statistics to deal with the nonstandard size and unknown environments and the deduction method to improve the efficiency of coverage. Simulation results show that the performance of the proposed online method is close to that of the offline algorithm which needs long time optimization, and the proposed method is more stable as well. Some performance defects of current MCPP methods in an unknown environment are ameliorated in this study.
ISSN:	1687-9600 1687-9619
DOI:	10.1155/2022/6825902