Higher Performance Visual Tracking with Dual-Modal Localization
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy. While most existing works fail to operate simultaneously on both, we investigate in this work the problem of conflicting performance between accuracy and robustness. We first conduct a systematic comparison among ex...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Visual Object Tracking (VOT) has synchronous needs for both robustness and
accuracy. While most existing works fail to operate simultaneously on both, we
investigate in this work the problem of conflicting performance between
accuracy and robustness. We first conduct a systematic comparison among
existing methods and analyze their restrictions in terms of accuracy and
robustness. Specifically, 4 formulations-offline classification (OFC), offline
regression (OFR), online classification (ONC), and online regression (ONR)-are
considered, categorized by the existence of online update and the types of
supervision signal. To account for the problem, we resort to the idea of
ensemble and propose a dual-modal framework for target localization, consisting
of robust localization suppressing distractors via ONR and the accurate
localization attending to the target center precisely via OFC. To yield a final
representation (i.e, bounding box), we propose a simple but effective score
voting strategy to involve adjacent predictions such that the final
representation does not commit to a single location. Operating beyond the
real-time demand, our proposed method is further validated on 8
datasets-VOT2018, VOT2019, OTB2015, NFS, UAV123, LaSOT, TrackingNet, and
GOT-10k, achieving state-of-the-art performance. |
---|---|
DOI: | 10.48550/arxiv.2103.10089 |