Scheduling the NASA Deep Space Network with Deep Reinforcement Learning

With three complexes spread evenly across the Earth, NASA's Deep Space Network (DSN) is the primary means of communications as well as a significant scientific instrument for dozens of active missions around the world. A rapidly rising number of spacecraft and increasingly complex scientific in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-02
Hauptverfasser:	Goh, Edwin, Hamsa Shwetha Venkataram, Hoffmann, Mark, Johnston, Mark, Wilson, Brian
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep learning Deep Space Network Negotiations Optimization Schedules Scheduling Space missions Spacecraft Spacecraft tracking Spacecraft-environment interaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With three complexes spread evenly across the Earth, NASA's Deep Space Network (DSN) is the primary means of communications as well as a significant scientific instrument for dozens of active missions around the world. A rapidly rising number of spacecraft and increasingly complex scientific instruments with higher bandwidth requirements have resulted in demand that exceeds the network's capacity across its 12 antennae. The existing DSN scheduling process operates on a rolling weekly basis and is time-consuming; for a given week, generation of the final baseline schedule of spacecraft tracking passes takes roughly 5 months from the initial requirements submission deadline, with several weeks of peer-to-peer negotiations in between. This paper proposes a deep reinforcement learning (RL) approach to generate candidate DSN schedules from mission requests and spacecraft ephemeris data with demonstrated capability to address real-world operational constraints. A deep RL agent is developed that takes mission requests for a given week as input, and interacts with a DSN scheduling environment to allocate tracks such that its reward signal is maximized. A comparison is made between an agent trained using Proximal Policy Optimization and its random, untrained counterpart. The results represent a proof-of-concept that, given a well-shaped reward signal, a deep RL agent can learn the complex heuristics used by experts to schedule the DSN. A trained agent can potentially be used to generate candidate schedules to bootstrap the scheduling process and thus reduce the turnaround cycle for DSN scheduling.
ISSN:	2331-8422