Multi-scale Deep Feature Transfer for Automatic Video Object Segmentation
Automatic video object segmentation aims to identify a video’s main object without human intervention. This task poses a challenge as it requires improving the synergy of feature fusion, which entails integrating motion and appearance cues. Although previous approaches have attempted to sample, prop...
Gespeichert in:
Veröffentlicht in: | Neural processing letters 2023-12, Vol.55 (8), p.11701-11719 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic video object segmentation aims to identify a video’s main object without human intervention. This task poses a challenge as it requires improving the synergy of feature fusion, which entails integrating motion and appearance cues. Although previous approaches have attempted to sample, propagate, and fuse these cues directly, they often suffer from misalignment issues. This is mainly because motion features focus on objects that are in motion, while appearance features tend to focus on more salient objects. In this paper, we design a Multi-scale Deep Feature Transfer Model (MFTM) to improve the upper limit of feature synergy through mutual mapping transformation between features. We consider the fused features as participants in feature interaction. By integrating these features, we encourage and constrain the appearance and motion features to enhance their compatibility. Additionally, we adopt pairwise combinations to facilitate the interaction propagation among motion cues, appearance cues, and fused features. This approach helps eliminate noise interference caused by different features, improving feature representations. In addition, we design a Multi-layer Feature Fusion Module (MFM) to further fuse features of different scales and levels, thereby improving the robustness and accuracy of the model’s prediction. We test our model on two popular benchmark datasets, DAVIS2016 and FBMS. Our
j
-score for DAVIS2016 reached 83.1 and our
j
-score for FBMS reached 77.3. Besides, we achieve impressive scores on the
E
MAX
,
F
MAX
, and
M
metrics for the FBMS. These results provide evidence for the effectiveness of our model. |
---|---|
ISSN: | 1370-4621 1573-773X |
DOI: | 10.1007/s11063-023-11395-x |