Learning End-to-End Visual Servoing Using an Improved Soft Actor-Critic Approach with Centralized Novelty Measurement
End-to-end visual servoing based on reinforcement learning(RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on instrumentation and measurement 2023-01, Vol.72, p.1-1 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | End-to-end visual servoing based on reinforcement learning(RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this paper presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the Prioritized Experience Replay (PER) based on a Temporal-Difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL Soft Actor-Critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent's input. The results highlight that our method's reward value and completion rates are 0.35 and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm. |
---|---|
ISSN: | 0018-9456 1557-9662 |
DOI: | 10.1109/TIM.2023.3273687 |