Modular transfer learning with transition mismatch compensation for excessive disturbance rejection

Underwater robots in shallow waters usually suffer from strong wave forces, which may frequently exceed robot’s control constraints. Learning-based controllers are suitable for disturbance rejection control, but the excessive disturbances heavily affect the state transition in Markov Decision Proces...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of machine learning and cybernetics 2023, Vol.14 (1), p.295-311
Hauptverfasser: Wang, Tianming, Lu, Wenjie, Yu, Huan, Liu, Dikai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Underwater robots in shallow waters usually suffer from strong wave forces, which may frequently exceed robot’s control constraints. Learning-based controllers are suitable for disturbance rejection control, but the excessive disturbances heavily affect the state transition in Markov Decision Process (MDP) or Partially Observable Markov Decision Process (POMDP). This issue is amplified by training-test model mismatch. In this paper, we propose a transfer reinforcement learning algorithm using Transition Mismatch Compensation (TMC), that learns an additional compensatory policy through minimizing mismatch of transitions predicted by the two dynamics models of the source and target tasks. A modular network of learning policies is applied, composed of a Generalized Control Policy (GCP) and an Online Disturbance Identification Model (ODI). GCP is first trained over a wide array of disturbance waveforms. ODI then learns to use past states and actions of the system to predict the disturbance waveforms which are provided as input to GCP (along with the system state). We demonstrated on a pose regulation task in simulation that TMC is able to successfully reject the disturbances and stabilize the robot under an empirical model of the robot system, meanwhile improve sample efficiency.
ISSN:1868-8071
1868-808X
DOI:10.1007/s13042-022-01641-4