Bottom-Up Visual Saliency Estimation With Deep Autoencoder-Based Sparse Reconstruction
Research on visual perception indicates that the human visual system is sensitive to center-surround (C-S) contrast in the bottom-up saliency-driven attention process. Different from the traditional contrast computation of feature difference, models based on reconstruction have emerged to estimate s...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2016-06, Vol.27 (6), p.1227-1240 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Research on visual perception indicates that the human visual system is sensitive to center-surround (C-S) contrast in the bottom-up saliency-driven attention process. Different from the traditional contrast computation of feature difference, models based on reconstruction have emerged to estimate saliency by starting from original images themselves instead of seeking for certain ad hoc features. However, in the existing reconstruction-based methods, the reconstruction parameters of each area are calculated independently without taking their global correlation into account. In this paper, inspired by the powerful feature learning and data reconstruction ability of deep autoencoders, we construct a deep C-S inference network and train it with the data sampled randomly from the entire image to obtain a unified reconstruction pattern for the current image. In this way, global competition in sampling and learning processes can be integrated into the nonlocal reconstruction and saliency estimation of each pixel, which can achieve better detection results than the models with separate consideration on local and global rarity. Moreover, by learning from the current scene, the proposed model can achieve the feature extraction and interaction simultaneously in an adaptive way, which can form a better generalization ability to handle more types of stimuli. Experimental results show that in accordance with different inputs, the network can learn distinct basic features for saliency modeling in its code layer. Furthermore, in a comprehensive evaluation on several benchmark data sets, the proposed method can outperform the existing state-of-the-art algorithms. |
---|---|
ISSN: | 2162-237X 2162-2388 |
DOI: | 10.1109/TNNLS.2015.2512898 |