Apprenticeship Learning: Learning to Schedule from Human Experts

Coordinating agents to complete a set of tasks with intercoupled emporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this doma...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gombolay,Matthew C, Shah,Julie, Stigile,Jessica, Son,Sung-Hyun, Jensen,Reed E
Format:	Report
Sprache:	eng
Schlagworte:	algorithms APPRENTICESHIP demonstrations irl(Inverse Reinforcement Learning) learning machine learning mdp(Markov Decision Process) NP-Hard observation Personnel Management and Labor Relations Scheduling supervised machine learning task prioritization training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Coordinating agents to complete a set of tasks with intercoupled emporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this domain knowledge within a computational framework is necessary to scale beyond the one-expert, one-trainee apprenticeship model. However, a human domain expert often has difficulty describing their decision-making process, causing the codification of this knowledge to become laborious. We propose a new approach for capturing domain expert heuristics through a pairwise ranking formulation. Our approach is model-free and does not require enumerating or iterating through a large state-space. We empirically demonstrate that this approach accurately learns multi-faceted heuristics on both a synthetic data set incorporating job-shop scheduling and vehicle routing problems and a real-world data set consisting of demonstrations of experts solving a variant of the weapon-to-target assignment problem. Our approach is able to learn scheduling policies of superior quality to those generated, on average, by human experts conducting an anti-ship missile defense task.