Video Object Discovery and Co-Segmentation with Extremely Weak Supervision

We present a spatio-temporal energy minimization formulation for simultaneous video object discovery and co-segmentation across multiple videos containing irrelevant frames. Our approach overcomes a limitation that most existing video co-segmentation methods possess, i.e., they perform poorly when d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2017-10, Vol.39 (10), p.2074-2088
Hauptverfasser:	Le Wang, Gang Hua, Sukthankar, Rahul, Jianru Xue, Zhenxing Niu, Nanning Zheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Cognition & reasoning Color Datasets Energy conservation Frames Image color analysis Image segmentation Labeling Labels Minimization Object segmentation Proposals Shape Spatial-MILBoost spatio-temporal auto-context model State of the art Target recognition video object co-segmentation Video object discovery
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present a spatio-temporal energy minimization formulation for simultaneous video object discovery and co-segmentation across multiple videos containing irrelevant frames. Our approach overcomes a limitation that most existing video co-segmentation methods possess, i.e., they perform poorly when dealing with practical videos in which the target objects are not present in many frames. Our formulation incorporates a spatio-temporal auto-context model, which is combined with appearance modeling for superpixel labeling. The superpixel-level labels are propagated to the frame level through a multiple instance boosting algorithm with spatial reasoning, based on which frames containing the target object are identified. Our method only needs to be bootstrapped with the frame-level labels for a few video frames (e.g., usually 1 to 3) to indicate if they contain the target objects or not. Extensive experiments on four datasets validate the efficacy of our proposed method: 1) object segmentation from a single video on the SegTrack dataset, 2) object co-segmentation from multiple videos on a video co-segmentation dataset, and 3) joint object discovery and co-segmentation from multiple videos containing irrelevant frames on the MOViCS dataset and XJTU-Stevens, a new dataset that we introduce in this paper. The proposed method compares favorably with the state-of-the-art in all of these experiments.
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2016.2612187