Hierarchical Reinforcement Learning-Based End-to-End Visual Servoing With Smooth Subgoals

Reinforcement learning (RL) offers the possibility of an end-to-end strategy of visual servoing (VS) from captured images or features. However, there will be unsmooth actions when RL-agent solely depends on the current state. In this article, a hierarchical proximal policy optimization method is pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on industrial electronics (1982) 2024-09, Vol.71 (9), p.11009-11018
Hauptverfasser:	He, Yaozhen, Gao, Jian, Li, Huiping, Chen, Yimin, Li, Yufeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Aerospace electronics Algorithms Convergence Hierarchical reinforcement learning (HRL) Linear programming proximal policy optimization (PPO) sequential data Task analysis Training transition function Visual servoing visual servoing (VS) Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning (RL) offers the possibility of an end-to-end strategy of visual servoing (VS) from captured images or features. However, there will be unsmooth actions when RL-agent solely depends on the current state. In this article, a hierarchical proximal policy optimization method is proposed for learning the VS strategy based on RL. A subgoal generation function based on the sequence of historical data is designed and defined as a high-level strategy to provide a smooth subgoal for low-level policy training. The low-level policy takes the current state and subgoal with smoothing attributes as inputs for considering historical data. Furthermore, a novel measurement approach is introduced through the mean cluster to encourage agent exploration during the learning process. The autonomous visual landing experiments are conducted for a quadrotor to validate the effectiveness of the proposed algorithm. The novelty analysis and VS performance analysis in different scenarios are shown in the comparative experiments.
ISSN:	0278-0046 1557-9948
DOI:	10.1109/TIE.2023.3337547