APANet: Auto-Path Aggregation for Future Instance Segmentation Prediction

Despite the remarkable progress achieved in conventional instance segmentation, the problem of predicting instance segmentation results for unobserved future frames remains challenging due to the unobservability of future data. Existing methods mainly address this challenge by forecasting features o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2022-07, Vol.44 (7), p.3386-3403
Hauptverfasser: Hu, Jian-Fang, Sun, Jiangxin, Lin, Zihang, Lai, Jian-Huang, Zeng, Wenjun, Zheng, Wei-Shi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Despite the remarkable progress achieved in conventional instance segmentation, the problem of predicting instance segmentation results for unobserved future frames remains challenging due to the unobservability of future data. Existing methods mainly address this challenge by forecasting features of future frames. However, these methods always treat features of multiple levels (e.g., coarse-to-fine pyramid features) independently and do not exploit them collaboratively, which results in inaccurate prediction for future frames; and moreover, such a weakness can partially hinder self-adaption of a future segmentation prediction model for different input samples. To solve this problem, we propose an adaptive aggregation approach called Auto-Path Aggregation Network (APANet), where the spatio-temporal contextual information obtained in the features of each individual level is selectively aggregated using the developed "auto-path". The "auto-path" connects each pair of features extracted at different pyramid levels for task-specific hierarchical contextual information aggregation, which enables selective and adaptive aggregation of pyramid features in accordance with different videos/frames. Our APANet can be further optimized jointly with the Mask R-CNN head as a feature decoder and a Feature Pyramid Network (FPN) feature encoder, forming a joint learning system for future instance segmentation prediction. We experimentally show that the proposed method can achieve state-of-the-art performance on three video-based instance segmentation benchmarks for future instance segmentation prediction.
ISSN:0162-8828
1939-3539
2160-9292
DOI:10.1109/TPAMI.2021.3058679