Multi-USV System Cooperative Underwater Target Search Based on Reinforcement Learning and Probability Map

Unmanned surface vehicle (USV) is a robotic system with autonomous planning, driving, and navigation capabilities. With the continuous development of applications, the missions faced by USV are becoming more and more complex, so it is difficult for a single USV to meet the mission requirements. Comp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematical problems in engineering 2020, Vol.2020 (2020), p.1-12
Hauptverfasser:	Xie, Jiajia, Wang, Min, Peng, Yan, Liu, Yuan, Zhou, Rui
Format:	Artikel
Sprache:	eng
Schlagworte:	Autonomous navigation Barriers Computer simulation Control algorithms Efficiency Engineering Environmental monitoring Machine learning Modules Search algorithms Surface vehicles Underwater Unmanned vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Unmanned surface vehicle (USV) is a robotic system with autonomous planning, driving, and navigation capabilities. With the continuous development of applications, the missions faced by USV are becoming more and more complex, so it is difficult for a single USV to meet the mission requirements. Compared with a single USV, a multi-USV system has some outstanding advantages such as fewer perceptual constraints, larger operation ranges, and stronger operation capability. In the search mission about multiple stationary underwater targets by a multi-USV system in the environment with obstacles, we propose a novel cooperative search algorithm (CSBDRL) based on reinforcement learning (RL) method and probability map method. CSBDRL is composed of the environmental sense module and policy module, which are organized by the “divide and conquer” policy-based architecture. The environmental sense module focuses on providing environmental sense values by using the probability map method. The policy module focuses on learning the optimal policy by using RL method. In CSBDRL, the mission environment is modeled and the corresponding reward function is designed to effectively explore the environment and learning policies. We test CSBDRL in the simulation environment and compare it with other methods. The results prove that compared with other methods, CSBDRL makes the multi-USV system have a higher search efficiency, which can ensure targets are found more quickly and accurately while ensuring the USV avoids obstacles in time during the mission.
ISSN:	1024-123X 1563-5147
DOI:	10.1155/2020/7842768