Learning-Based 6-DOF Control for Autonomous Proximity Operations Under Motion Constraints

This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL techn...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on aerospace and electronic systems 2021-12, Vol.57 (6), p.4097-4109
Hauptverfasser: Hu, Qinglei, Yang, Haoyang, Dong, Hongyang, Zhao, Xiaowei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL technique, achieving an online approximate optimal control subject to the full 6-DOF nonlinear dynamics of spacecraft; 2) nontrivial motion constraints of proximity operations are considered and strictly obeyed during the whole control process. As a stepping stone, the dual-quaternion formalism is employed to characterize the 6-DOF dynamics model and motion constraints. Then, an RL-based control scheme is developed under the dual-quaternion algebraic framework to approximate the optimal control solution subject to a cost function and a Hamilton–Jacobi–Bellman equation. In addition, a specially designed barrier function is embedded in the reward function to avoid motion constraint violations. The Lyapunov-based stability analysis guarantees the ultimate boundedness of state errors and the weight of NN estimation errors. Besides, we also show that a PD-like controller under dual-quaternion formulation can be employed as the initial control policy to trigger the online learning process. The boundedness of it is proved by a special Lyapunov strictification method. Simulation results of prototypical spacecraft missions with proximity operations are provided to illustrate the effectiveness of the proposed method.
ISSN:0018-9251
1557-9603
DOI:10.1109/TAES.2021.3094628