HYBRID ONLINE POLICY ADAPTATION STRATEGY FOR ATTITUDE POINTING PERFORMANCE

This specification relates to systems, methods and apparatus for controlling a satellite using machine-learning models, and the training of such machine-learning models. According to a first aspect of this specification there is described a computer implemented method for controlling a satellite com...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	WATT, Mark, LACHEVRE, Pierre, GARCÍA, Carlos Hervás, PASSARIN, Federico
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	AIRCRAFT AVIATION COSMONAUTICS PERFORMING OPERATIONS TRANSPORTING VEHICLES OR EQUIPMENT THEREFOR
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This specification relates to systems, methods and apparatus for controlling a satellite using machine-learning models, and the training of such machine-learning models. According to a first aspect of this specification there is described a computer implemented method for controlling a satellite comprising: for at least one machine learning model for controlling at least the attitude of the satellite controlling a satellite for an episode of time using said machine-learning model. The controlling comprises: computing, by one or more processors of the satellite, a nominal control command for the satellite from data representing the current state of the satellite using a predefined nominal control model; generating, by the one or more processors and using said machine learning model, one or more corrections to said nominal control command using the current attitude state of the satellite and said nominal control command (5.3); generating, by the one or more processors, an effective control command by applying said correction to said first predefined nominal control command; and controlling the satellite based on said effective control. The method further comprises evaluating, by the one or more processors, a performance of the effective control commands for the episode using a reward function providing rewards (5.4); and updating, by the one or more processors, said machine learning model based on the rewards, wherein updates are determined using a metaheuristic optimisation algorithm (5.5).