Failure is Not an Option: Policy Learning for Adaptive Recovery in Space Operations

This letter considers the problem of how robots in long-term space operations can learn to choose appropriate sources of assistance to recover from failures. Current assistant selection methods for failure handling are based on manually specified static lookup tables or policies, which are not respo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2018-07, Vol.3 (3), p.1639-1646
Hauptverfasser:	McGuire, Steve, Furlong, P. Michael, Heckman, Christoffer, Julier, Simon, Szafir, Daniel, Ahmed, Nisar
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer simulation Earth Heuristic algorithms human factors and human-in-the-loop Human performance Learning and adaptive systems Lookup tables Machine learning Monitoring Policies Resource management Robots Space exploration Space missions space robotics and automation Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This letter considers the problem of how robots in long-term space operations can learn to choose appropriate sources of assistance to recover from failures. Current assistant selection methods for failure handling are based on manually specified static lookup tables or policies, which are not responsive to dynamic environments or uncertainty in human performance. We describe a novel and highly flexible learning-based assistant selection framework that uses contextual multiarm bandit algorithms. The contextual bandits exploit information from observed environment and assistant performance variables to efficiently learn selection policies under a wide set of uncertain operating conditions and unknown/dynamically constrained assistant capabilities. Proof of concept simulations of long-term human-robot interactions for space exploration are used to compare the performance of the contextual bandit against other state-of-the-art assistant selection approaches. The contextual bandit outperforms conventional static policies and noncontextual learning approaches, and also demonstrates favorable robustness and scaling properties.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2018.2801468