Video Region Annotation with Sparse Bounding Boxes

Video analysis has been moving towards more detailed interpretation (e.g., segmentation) with encouraging progress. These tasks, however, increasingly rely on densely annotated training data both in space and time. Since such annotation is labor-intensive, few densely annotated video data with detai...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer vision 2023-03, Vol.131 (3), p.717-731
Hauptverfasser: Xu, Yuzheng, Wu, Yang, binti Zuraimi, Nur Sabrina, Nobuhara, Shohei, Nishino, Ko
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Video analysis has been moving towards more detailed interpretation (e.g., segmentation) with encouraging progress. These tasks, however, increasingly rely on densely annotated training data both in space and time. Since such annotation is labor-intensive, few densely annotated video data with detailed region boundaries exist. This work aims to resolve this dilemma by learning to automatically generate region boundaries for all frames of a video from sparsely annotated bounding boxes of target regions. We achieve this with a Volumetric Graph Convolutional Network (VGCN), which learns to iteratively find keypoints on the region boundaries using the spatio-temporal volume of surrounding appearance and motion. We show that the global optimization of VGCN leads to more accurate annotation that generalizes better. Experimental results using three latest datasets (two real and one synthetic), including ablation studies, demonstrate the effectiveness and superiority of our method.
ISSN:0920-5691
1573-1405
DOI:10.1007/s11263-022-01719-0