Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach with Centralized Novelty Measurement

End-to-end visual servoing based on reinforcement learning(RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on instrumentation and measurement 2023-01, Vol.72, p.1-1
Hauptverfasser: Gao, Jian, He, Yaozhen, Chen, Yimin, Li, Yufeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:End-to-end visual servoing based on reinforcement learning(RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this paper presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the Prioritized Experience Replay (PER) based on a Temporal-Difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL Soft Actor-Critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35 and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.
ISSN:0018-9456
1557-9662
DOI:10.1109/TIM.2023.3273687