Bottom-Up Visual Saliency Estimation With Deep Autoencoder-Based Sparse Reconstruction

Research on visual perception indicates that the human visual system is sensitive to center-surround (C-S) contrast in the bottom-up saliency-driven attention process. Different from the traditional contrast computation of feature difference, models based on reconstruction have emerged to estimate s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2016-06, Vol.27 (6), p.1227-1240
Hauptverfasser:	Xia, Chen, Qi, Fei, Shi, Guangming
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Algorithms Autoencoder center-surround (C-S) difference Computational modeling deep learning Estimation Feature extraction Image color analysis Image reconstruction Learning Mathematical models Networks Neural networks Reconstruction saliency unsupervised feature learning Visual Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Research on visual perception indicates that the human visual system is sensitive to center-surround (C-S) contrast in the bottom-up saliency-driven attention process. Different from the traditional contrast computation of feature difference, models based on reconstruction have emerged to estimate saliency by starting from original images themselves instead of seeking for certain ad hoc features. However, in the existing reconstruction-based methods, the reconstruction parameters of each area are calculated independently without taking their global correlation into account. In this paper, inspired by the powerful feature learning and data reconstruction ability of deep autoencoders, we construct a deep C-S inference network and train it with the data sampled randomly from the entire image to obtain a unified reconstruction pattern for the current image. In this way, global competition in sampling and learning processes can be integrated into the nonlocal reconstruction and saliency estimation of each pixel, which can achieve better detection results than the models with separate consideration on local and global rarity. Moreover, by learning from the current scene, the proposed model can achieve the feature extraction and interaction simultaneously in an adaptive way, which can form a better generalization ability to handle more types of stimuli. Experimental results show that in accordance with different inputs, the network can learn distinct basic features for saliency modeling in its code layer. Furthermore, in a comprehensive evaluation on several benchmark data sets, the proposed method can outperform the existing state-of-the-art algorithms.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2015.2512898