SMA-NBO: A Sequential Multi-Agent Planning with Nominal Belief-State Optimization in Target Tracking

In target tracking with mobile multi-sensor systems, sensor deployment impacts the observation capabilities and the resulting state estimation quality. Based on a partially observable Markov decision process (POMDP) formulation comprised of the observable sensor dynamics, unobservable target states,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Li, Tianqi, Krakow, Lucas W, Gopalswamy, Swaminathan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In target tracking with mobile multi-sensor systems, sensor deployment impacts the observation capabilities and the resulting state estimation quality. Based on a partially observable Markov decision process (POMDP) formulation comprised of the observable sensor dynamics, unobservable target states, and accompanying observation laws, we present a distributed information-driven solution approach to the multi-agent target tracking problem, namely, sequential multi-agent nominal belief-state optimization (SMA-NBO). SMA-NBO seeks to minimize the expected tracking error via receding horizon control including a heuristic expected cost-to-go (HECTG). SMA-NBO incorporates a computationally efficient approximation of the target belief-state over the horizon. The agent-by-agent decision-making is capable of leveraging on-board (edge) compute for selecting (sub-optimal) target-tracking maneuvers exhibiting non-myopic cooperative fleet behavior. The optimization problem explicitly incorporates semantic information defining target occlusions from a world model. To illustrate the efficacy of our approach, a random occlusion forest environment is simulated. SMA-NBO is compared to other baseline approaches. The simulation results show SMA-NBO 1) maintains tracking performance and reduces the computational cost by replacing the calculation of the expected target trajectory with a single sample trajectory based on maximum a posteriori estimation; 2) generates cooperative fleet decision by sequentially optimizing single-agent policy with efficient usage of other agents' policy of intent; 3) aptly incorporates the multiple weighted trace penalty (MWTP) HECTG, which improves tracking performance with a computationally efficient heuristic.
DOI:10.48550/arxiv.2203.01507