Dense Semantic Forecasting in Video by Joint Regression of Features and Feature Motion

Dense semantic forecasting anticipates future events in the video by inferring pixel-level semantics of an unobserved future image. We present a novel approach that is applicable to various single-frame architectures and tasks. Our approach consists of two modules. The feature-to-motion (F2M) module...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2023-09, Vol.34 (9), p.6443-6455
Hauptverfasser: Saric, Josip, Vrazic, Sacha, Segvic, Sinisa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Dense semantic forecasting anticipates future events in the video by inferring pixel-level semantics of an unobserved future image. We present a novel approach that is applicable to various single-frame architectures and tasks. Our approach consists of two modules. The feature-to-motion (F2M) module forecasts a dense deformation field that warps past features into their future positions. The feature-to-feature (F2F) module regresses the future features directly and is, therefore, able to account for emergent scenery. The compound F2MF model decouples the effects of motion from the effects of novelty in a task-agnostic manner. We aim to apply F2MF forecasting to the most subsampled and the most abstract representation of the desired single-frame model. Our design takes advantage of deformable convolutions and spatial correlation coefficients across neighboring time instants. We perform experiments on three dense prediction tasks: semantic segmentation, instance-level segmentation, and panoptic segmentation. The results reveal state-of-the-art forecasting accuracy across three dense prediction tasks.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2021.3136624