Solving nonstationary Markov decision processes via contextual decomposition: A military air battle management application

Reinforcement learning for nonstationary problems is a subject of widespread research given that most realistic problems do not exist within static environments. Approaching these problems can require significant effort in feature engineering to provide a learning algorithm with enough useful inform...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2023-12, Vol.233, p.120949, Article 120949
Hauptverfasser:	Liles, Joseph M., Robbins, Matthew J., Lunday, Brian J.
Format:	Artikel
Sprache:	eng
Schlagworte:	Air battle management Context detection Dynamic assignment problem Nonstationary Markov decision process OR in defense Reinforcement learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning for nonstationary problems is a subject of widespread research given that most realistic problems do not exist within static environments. Approaching these problems can require significant effort in feature engineering to provide a learning algorithm with enough useful information about the state space to uncover complex system dynamics. As an alternative for problems with sufficient data describing the nonstationary environment, we propose the contextual decomposition Markov decision process (CDMDP) as a collection of stationary sub-problems intended to approximate nonstationary problem dynamics using a linear combination of value functions. We demonstrate the effectiveness of the CDMDP approach with an application in military air battle management. We use a designed computational experiment and analysis of variance to show that a complex, nonstationary learning problem can be effectively approximated with a small set of stationary sub-problems, and that the CDMDP solution significantly improves solution quality over a baseline approach without the need for additional feature engineering. If a researcher suspects that a complex and continuously varying environment can be approximated by a small number of stationary contexts, the CDMDP framework may save significant computational resources and yield decision policies that are much easier to visualize and implement. •We model a military air battle scenario with a nonstationary Markov decision process.•We decompose a complex problem into a number of smaller sub-problems.•We use a computational experiment to find an effective neural network architecture.•We use a learning algorithm to develop decision policies for autonomous aircraft.•Our approach significantly improves solution quality over a baseline.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.120949