Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes

Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by us...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2012-06, Vol.34 (6), p.1080-1091
Hauptverfasser:	Vig, E., Dorr, M., Martinetz, T., Barth, E.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applied sciences Artificial intelligence Biological and medical sciences Biological system modeling Coding Complexity Computational modeling Computational models of vision Computer science control theory systems computer vision Exact sciences and technology eye movement prediction Eye movements Eye Movements - physiology Feature extraction Fundamental and applied biological sciences. Psychology Humans Image color analysis Intelligence interest point detection intrinsic dimension Mathematical models Pattern analysis Pattern Recognition, Visual Pattern recognition. Digital image processing. Computational geometry Perception Predictive models Principal Component Analysis Psychology. Psychoanalysis. Psychiatry Psychology. Psychophysiology Representations spatiotemporal saliency video analysis Videos Vision Vision, Ocular - physiology visual attention Visual Perception Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2011.198