MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning
Harnessing low-light enhancement and domain adaptation, nighttime UAV tracking has made substantial strides. However, over-reliance on image enhancement, scarcity of high-quality nighttime data, and neglecting the relationship between daytime and nighttime trackers, which hinders the development of...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Harnessing low-light enhancement and domain adaptation, nighttime UAV
tracking has made substantial strides. However, over-reliance on image
enhancement, scarcity of high-quality nighttime data, and neglecting the
relationship between daytime and nighttime trackers, which hinders the
development of an end-to-end trainable framework. Moreover, current CNN-based
trackers have limited receptive fields, leading to suboptimal performance,
while ViT-based trackers demand heavy computational resources due to their
reliance on the self-attention mechanism. In this paper, we propose a novel
pure Mamba-based tracking framework (\textbf{MambaNUT}) that employs a state
space model with linear complexity as its backbone, incorporating a
single-stream architecture that integrates feature learning and template-search
coupling within Vision Mamba. We introduce an adaptive curriculum learning
(ACL) approach that dynamically adjusts sampling strategies and loss weights,
thereby improving the model's ability of generalization. Our ACL is composed of
two levels of curriculum schedulers: (1) sampling scheduler that transforms the
data distribution from imbalanced to balanced, as well as from easier (daytime)
to harder (nighttime) samples; (2) loss scheduler that dynamically assigns
weights based on data frequency and the IOU. Exhaustive experiments on multiple
nighttime UAV tracking benchmarks demonstrate that the proposed MambaNUT
achieves state-of-the-art performance while requiring lower computational
costs. The code will be available. |
---|---|
DOI: | 10.48550/arxiv.2412.00626 |