Autonomous underwater vehicle formation control and obstacle avoidance using multi-agent generative adversarial imitation learning

Autonomous underwater vehicle (AUV) is widely used in complex underwater missions such as bottom survey and data collection. Multiple AUVs can cooperatively complete tasks that single AUV cannot accomplish. Recently, multi-agent reinforcement learning (MARL) has been introduced to improve multi-AUV...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Ocean engineering 2022-10, Vol.262, p.112182, Article 112182
Hauptverfasser:	Fang, Zheng, Jiang, Dong, Huang, Jie, Cheng, Chunxi, Sha, Qixin, He, Bo, Li, Guangliang
Format:	Artikel
Sprache:	eng
Schlagworte:	Autonomous underwater vehicle Formation control Imitation learning Multi-agent reinforcement learning Obstacle avoidance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Autonomous underwater vehicle (AUV) is widely used in complex underwater missions such as bottom survey and data collection. Multiple AUVs can cooperatively complete tasks that single AUV cannot accomplish. Recently, multi-agent reinforcement learning (MARL) has been introduced to improve multi-AUV control in uncertain marine environments. However, it is very difficult and even unpractical to design effective and efficient reward functions for various tasks. In this paper, we implemented multi-agent generative adversarial imitation learning (MAGAIL) from expert demonstrated trajectories for formation control and obstacle avoidance of multi-AUV. In addition, decentralized training with decentralized execution framework was adopted to alleviate the communication problem in underwater environments. Moreover, to facilitate the discriminator to accurately judge the quality of AUV’s trajectory in the two tasks and increase the convergence speed, we improved upon MAGAIL by dividing the state–action pairs of expert trajectory for each AUV into two groups and updating discriminator by randomly selecting equal number of state–action pairs from both groups. Our experimental results on a simulated AUV system modeling Sailfish 210 of our lab in the Gazebo simulation environment show that MAGAIL allows control policies of multi-AUV to obtain a better performance than traditional multi-agent deep reinforcement learning from fine-tuned reward function — IPPO. Moreover, control policies trained via MAGAIL in simple tasks can generalize better to complex tasks than those trained via IPPO. •Implemented MAGAIL for formation control and obstacle avoidance of multi-AUV.•Adopted DTDE framework to solve the limited communication problem.•Results show MAGAIL allows AUVs to achieve a better performance than IPPO.•MAGAIL was shown to generalize better than IPPO in two new and complex tasks.
ISSN:	0029-8018 1873-5258
DOI:	10.1016/j.oceaneng.2022.112182