FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos
We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a t...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose an end-to-end learning framework for segmenting generic objects in
videos. Our method learns to combine appearance and motion information to
produce pixel level segmentation masks for all prominent objects in videos. We
formulate this task as a structured prediction problem and design a two-stream
fully convolutional neural network which fuses together motion and appearance
in a unified framework. Since large-scale video datasets with pixel level
segmentations are problematic, we show how to bootstrap weakly annotated videos
together with existing image recognition datasets for training. Through
experiments on three challenging video segmentation benchmarks, our method
substantially improves the state-of-the-art for segmenting generic (unseen)
objects. Code and pre-trained models are available on the project website. |
---|---|
DOI: | 10.48550/arxiv.1701.05384 |