SMA-NBO: A Sequential Multi-Agent Planning with Nominal Belief-State Optimization in Target Tracking
In target tracking with mobile multi-sensor systems, sensor deployment impacts the observation capabilities and the resulting state estimation quality. Based on a partially observable Markov decision process (POMDP) formulation comprised of the observable sensor dynamics, unobservable target states,...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In target tracking with mobile multi-sensor systems, sensor deployment
impacts the observation capabilities and the resulting state estimation
quality. Based on a partially observable Markov decision process (POMDP)
formulation comprised of the observable sensor dynamics, unobservable target
states, and accompanying observation laws, we present a distributed
information-driven solution approach to the multi-agent target tracking
problem, namely, sequential multi-agent nominal belief-state optimization
(SMA-NBO). SMA-NBO seeks to minimize the expected tracking error via receding
horizon control including a heuristic expected cost-to-go (HECTG). SMA-NBO
incorporates a computationally efficient approximation of the target
belief-state over the horizon. The agent-by-agent decision-making is capable of
leveraging on-board (edge) compute for selecting (sub-optimal) target-tracking
maneuvers exhibiting non-myopic cooperative fleet behavior. The optimization
problem explicitly incorporates semantic information defining target occlusions
from a world model. To illustrate the efficacy of our approach, a random
occlusion forest environment is simulated. SMA-NBO is compared to other
baseline approaches. The simulation results show SMA-NBO 1) maintains tracking
performance and reduces the computational cost by replacing the calculation of
the expected target trajectory with a single sample trajectory based on maximum
a posteriori estimation; 2) generates cooperative fleet decision by
sequentially optimizing single-agent policy with efficient usage of other
agents' policy of intent; 3) aptly incorporates the multiple weighted trace
penalty (MWTP) HECTG, which improves tracking performance with a
computationally efficient heuristic. |
---|---|
DOI: | 10.48550/arxiv.2203.01507 |