Sim-real joint experimental verification for an unmanned surface vehicle formation strategy based on multi-agent deterministic policy gradient and line of sight guidance

The formation of multiple Unmanned Surface Vehicles (USVs) is an effective way to extend the capabilities of a single USV to satisfy relatively complex tasks in practice. In this study, we proposed a formation-strategy-based deep reinforcement learning method called Multi-agent Deterministic Policy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Ocean engineering 2023-02, Vol.270, p.113661, Article 113661
Hauptverfasser: Li, Yan, Li, Xiaowen, Wei, Xiangwei, Wang, Hao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The formation of multiple Unmanned Surface Vehicles (USVs) is an effective way to extend the capabilities of a single USV to satisfy relatively complex tasks in practice. In this study, we proposed a formation-strategy-based deep reinforcement learning method called Multi-agent Deterministic Policy Gradient (MADDPG) to realize multi-USV formation. In this work, Line of Sight (LOS) guidance is integrated into the formation strategy under a leader-follower scheme. With the advantage of ignoring the dynamic model of the USV, the proposed formation strategy has strong migration potential to be transferred to other multi-agent systems. To evaluate the performance of the multi-USV formation, we designed two different scenarios in line with the practical tasks carried out with the multi-USV system covering observation aperture enhancement with the desired formation and dynamic non-cooperative target roundup. The performance of the proposed multi-USV formation strategy was demonstrated in both a simulation environment and a real-world environment. Compared with other deep reinforcement learning-inspired and traditional approaches, our proposed strategy based on MADDPG achieved a higher task success rate. It also outperformed the Deep Deterministic Policy Gradient (DDPG) in other metrics because it can acquire knowledge more effectively from dynamic environments by observing joint information and from the centralized training. •A learning-inspired formation strategy for a multi-USV system is proposed based deep reinforcement learning and LOS guidance.•The multi-USV formation strategy is demonstrated both in the simulation environment and the dynamic real-world environment.•The task success rate of the strategy has an average increase of 16.50% compared to the DDPG.•The proposed formation strategy is a plug-and-play approach having the potential to be transferred to other multi-agent systems.
ISSN:0029-8018
1873-5258
DOI:10.1016/j.oceaneng.2023.113661