MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model
Tracking by detection has been the prevailing paradigm in the field of Multi-object Tracking (MOT). These methods typically rely on the Kalman Filter to estimate the future locations of objects, assuming linear object motion. However, they fall short when tracking objects exhibiting nonlinear and di...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Tracking by detection has been the prevailing paradigm in the field of
Multi-object Tracking (MOT). These methods typically rely on the Kalman Filter
to estimate the future locations of objects, assuming linear object motion.
However, they fall short when tracking objects exhibiting nonlinear and diverse
motion in scenarios like dancing and sports. In addition, there has been
limited focus on utilizing learning-based motion predictors in MOT. To address
these challenges, we resort to exploring data-driven motion prediction methods.
Inspired by the great expectation of state space models (SSMs), such as Mamba,
in long-term sequence modeling with near-linear complexity, we introduce a
Mamba-based motion model named Mamba moTion Predictor (MTP). MTP is designed to
model the complex motion patterns of objects like dancers and athletes.
Specifically, MTP takes the spatial-temporal location dynamics of objects as
input, captures the motion pattern using a bi-Mamba encoding layer, and
predicts the next motion. In real-world scenarios, objects may be missed due to
occlusion or motion blur, leading to premature termination of their
trajectories. To tackle this challenge, we further expand the application of
MTP. We employ it in an autoregressive way to compensate for missing
observations by utilizing its own predictions as inputs, thereby contributing
to more consistent trajectories. Our proposed tracker, MambaTrack, demonstrates
advanced performance on benchmarks such as Dancetrack and SportsMOT, which are
characterized by complex motion and severe occlusion. |
---|---|
DOI: | 10.48550/arxiv.2408.09178 |