DevsNet: Deep Video Saliency Network using Short-term and Long-term Cues
•We design a novel video saliency detection model by design the new 3-D ConvNet and B-ConvLSTM to extract short-term and long-term spatiotemporal cues, respectively. Through combining short-term and long-term spatiotemporal features, the proposed model can obtain promising performance for video sali...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2020-07, Vol.103, p.107294, Article 107294 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We design a novel video saliency detection model by design the new 3-D ConvNet and B-ConvLSTM to extract short-term and long-term spatiotemporal cues, respectively. Through combining short-term and long-term spatiotemporal features, the proposed model can obtain promising performance for video saliency prediction.•We design a new two-layer B-ConvLSTM structure for long-term spatiotemporal feature extraction for video saliency detection. The proposed B-ConvLSTM can extract the temporal information not just from the previous video frames but also from the next frames, which demonstrates that the proposed network takes both the forward and backward temporal features into account.
Recently, there have been various saliency detection methods proposed for still images based on deep learning techniques. However, the research on saliency detection for video sequences is still limited. In this study, we introduce a novel deep learning framework of saliency detection for video sequences, namely Deep Video Saliency Network (DevsNet). DevsNet mainly consists of two components: 3D Convolutional Network (3D-ConvNet) and Bidirectional Convolutional Long-Short Term Memory Network (B-ConvLSTM). 3D-ConvNet is constructed to learn short-term spatiotemporal information and the long-term spatiotemporal features are learned by B-ConvLSTM. The designed B-ConvLSTM can extract the temporal information not just from the previous video frames but also from the next frames, which demonstrates that the proposed model considers both the forward and backward temporal information. By combining the short-term and long-term spatiotemporal cues, the proposed DevsNet can extract saliency information for video sequences effectively and efficiently. Extensive experiments have been conducted to show that the proposed model can obtain better performance in spatiotemporal saliency prediction than the state-of-the-art models. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2020.107294 |