X-Net: A Binocular Summation Network for Foreground Segmentation

In foreground segmentation, it is challenging to construct an effective background model to learn the spatial-temporal representation of the background. Recently, deep learning-based background models (DBMs) with the capability of extracting high-level features have shown remarkable performance. How...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.71412-71422
Hauptverfasser: Zhang, Jin, Li, Yang, Chen, Feiqiong, Pan, Zhisong, Zhou, Xingyu, Li, Yudong, Jiao, Shanshan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In foreground segmentation, it is challenging to construct an effective background model to learn the spatial-temporal representation of the background. Recently, deep learning-based background models (DBMs) with the capability of extracting high-level features have shown remarkable performance. However, the existing state-of-the-art DBMs deal with video segmentation as single-image segmentation and ignore temporal cues in video sequences. To exploit temporal data sufficiently, this paper proposes a multi-input multi-output (MIMO) DBM framework for the first time, which is partially inspired by the binocular summation effect in human eyes. Our framework is an X-shaped network which allows the DBM to track temporal changes in a video sequence. Moreover, each output branch of our model could receive visual signals from two similar input frames simultaneously like the binocular summation mechanism. In addition, our model can be trained end-to-end using only a few training examples without any post-processing. We evaluate our method on the largest dataset for change detection (CDnet 2014) and achieve the state-of-the-art performance by an average overall F-Measure of 0.9920.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2919802