Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes

Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by us...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2012-06, Vol.34 (6), p.1080-1091
Hauptverfasser: Vig, E., Dorr, M., Martinetz, T., Barth, E.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1091
container_issue 6
container_start_page 1080
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 34
creator Vig, E.
Dorr, M.
Martinetz, T.
Barth, E.
description Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.
doi_str_mv 10.1109/TPAMI.2011.198
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671438703</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6042873</ieee_id><sourcerecordid>1671438703</sourcerecordid><originalsourceid>FETCH-LOGICAL-c406t-23d6f61c66e3e6821cf279585a90e3c0015edec22f82fc648e292aaa51f368c13</originalsourceid><addsrcrecordid>eNqF0UtLAzEQB_AgitbH1YsgCyJ42ZpJstnkWHwWfIF6XmI6wcg-NNk99Nub2qrgxVNC8ptJmD8h-0DHAFSfPj1MbqdjRgHGoNUaGYHmOucF1-tkREGyXCmmtsh2jG-Ugigo3yRbjBUgpShH5Hra9sG30dvs3DeYNl1rat_Ps4eAM2_7mPWvmD2mM2ztPOtcdmf6IZg6O5-3pkl1jxZbjLtkw5k64t5q3SHPlxdPZ9f5zf3V9Gxyk1tBZZ8zPpNOgpUSOUrFwDpW6kIVRlPkNn2xwBlaxpxizkqhkGlmjCnAcaks8B1ysuz7HrqPAWNfNT5arGvTYjfECmQJgquS8v8ppWk4TAuR6NEf-tYNIU3iS2kAKfSi4XipbOhiDOiq9-AbE-YJVYs4qq84qkUcVYojFRyu2g4vDc5--Pf8EzheAROtqV0wrfXx10lBueKLlw-WziPiz7WkgqmS809yuJjv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1009116493</pqid></control><display><type>article</type><title>Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes</title><source>IEEE Electronic Library (IEL)</source><creator>Vig, E. ; Dorr, M. ; Martinetz, T. ; Barth, E.</creator><creatorcontrib>Vig, E. ; Dorr, M. ; Martinetz, T. ; Barth, E.</creatorcontrib><description>Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2011.198</identifier><identifier>PMID: 22516647</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>Los Alamitos, CA: IEEE</publisher><subject>Algorithms ; Applied sciences ; Artificial intelligence ; Biological and medical sciences ; Biological system modeling ; Coding ; Complexity ; Computational modeling ; Computational models of vision ; Computer science; control theory; systems ; computer vision ; Exact sciences and technology ; eye movement prediction ; Eye movements ; Eye Movements - physiology ; Feature extraction ; Fundamental and applied biological sciences. Psychology ; Humans ; Image color analysis ; Intelligence ; interest point detection ; intrinsic dimension ; Mathematical models ; Pattern analysis ; Pattern Recognition, Visual ; Pattern recognition. Digital image processing. Computational geometry ; Perception ; Predictive models ; Principal Component Analysis ; Psychology. Psychoanalysis. Psychiatry ; Psychology. Psychophysiology ; Representations ; spatiotemporal saliency ; video analysis ; Videos ; Vision ; Vision, Ocular - physiology ; visual attention ; Visual Perception ; Visualization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2012-06, Vol.34 (6), p.1080-1091</ispartof><rights>2015 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jun 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c406t-23d6f61c66e3e6821cf279585a90e3c0015edec22f82fc648e292aaa51f368c13</citedby><cites>FETCH-LOGICAL-c406t-23d6f61c66e3e6821cf279585a90e3c0015edec22f82fc648e292aaa51f368c13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6042873$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,778,782,794,27907,27908,54741</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6042873$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=26403833$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22516647$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Vig, E.</creatorcontrib><creatorcontrib>Dorr, M.</creatorcontrib><creatorcontrib>Martinetz, T.</creatorcontrib><creatorcontrib>Barth, E.</creatorcontrib><title>Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Biological and medical sciences</subject><subject>Biological system modeling</subject><subject>Coding</subject><subject>Complexity</subject><subject>Computational modeling</subject><subject>Computational models of vision</subject><subject>Computer science; control theory; systems</subject><subject>computer vision</subject><subject>Exact sciences and technology</subject><subject>eye movement prediction</subject><subject>Eye movements</subject><subject>Eye Movements - physiology</subject><subject>Feature extraction</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Humans</subject><subject>Image color analysis</subject><subject>Intelligence</subject><subject>interest point detection</subject><subject>intrinsic dimension</subject><subject>Mathematical models</subject><subject>Pattern analysis</subject><subject>Pattern Recognition, Visual</subject><subject>Pattern recognition. Digital image processing. Computational geometry</subject><subject>Perception</subject><subject>Predictive models</subject><subject>Principal Component Analysis</subject><subject>Psychology. Psychoanalysis. Psychiatry</subject><subject>Psychology. Psychophysiology</subject><subject>Representations</subject><subject>spatiotemporal saliency</subject><subject>video analysis</subject><subject>Videos</subject><subject>Vision</subject><subject>Vision, Ocular - physiology</subject><subject>visual attention</subject><subject>Visual Perception</subject><subject>Visualization</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNqF0UtLAzEQB_AgitbH1YsgCyJ42ZpJstnkWHwWfIF6XmI6wcg-NNk99Nub2qrgxVNC8ptJmD8h-0DHAFSfPj1MbqdjRgHGoNUaGYHmOucF1-tkREGyXCmmtsh2jG-Ugigo3yRbjBUgpShH5Hra9sG30dvs3DeYNl1rat_Ps4eAM2_7mPWvmD2mM2ztPOtcdmf6IZg6O5-3pkl1jxZbjLtkw5k64t5q3SHPlxdPZ9f5zf3V9Gxyk1tBZZ8zPpNOgpUSOUrFwDpW6kIVRlPkNn2xwBlaxpxizkqhkGlmjCnAcaks8B1ysuz7HrqPAWNfNT5arGvTYjfECmQJgquS8v8ppWk4TAuR6NEf-tYNIU3iS2kAKfSi4XipbOhiDOiq9-AbE-YJVYs4qq84qkUcVYojFRyu2g4vDc5--Pf8EzheAROtqV0wrfXx10lBueKLlw-WziPiz7WkgqmS809yuJjv</recordid><startdate>20120601</startdate><enddate>20120601</enddate><creator>Vig, E.</creator><creator>Dorr, M.</creator><creator>Martinetz, T.</creator><creator>Barth, E.</creator><general>IEEE</general><general>IEEE Computer Society</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20120601</creationdate><title>Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes</title><author>Vig, E. ; Dorr, M. ; Martinetz, T. ; Barth, E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c406t-23d6f61c66e3e6821cf279585a90e3c0015edec22f82fc648e292aaa51f368c13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Biological and medical sciences</topic><topic>Biological system modeling</topic><topic>Coding</topic><topic>Complexity</topic><topic>Computational modeling</topic><topic>Computational models of vision</topic><topic>Computer science; control theory; systems</topic><topic>computer vision</topic><topic>Exact sciences and technology</topic><topic>eye movement prediction</topic><topic>Eye movements</topic><topic>Eye Movements - physiology</topic><topic>Feature extraction</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Humans</topic><topic>Image color analysis</topic><topic>Intelligence</topic><topic>interest point detection</topic><topic>intrinsic dimension</topic><topic>Mathematical models</topic><topic>Pattern analysis</topic><topic>Pattern Recognition, Visual</topic><topic>Pattern recognition. Digital image processing. Computational geometry</topic><topic>Perception</topic><topic>Predictive models</topic><topic>Principal Component Analysis</topic><topic>Psychology. Psychoanalysis. Psychiatry</topic><topic>Psychology. Psychophysiology</topic><topic>Representations</topic><topic>spatiotemporal saliency</topic><topic>video analysis</topic><topic>Videos</topic><topic>Vision</topic><topic>Vision, Ocular - physiology</topic><topic>visual attention</topic><topic>Visual Perception</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vig, E.</creatorcontrib><creatorcontrib>Dorr, M.</creatorcontrib><creatorcontrib>Martinetz, T.</creatorcontrib><creatorcontrib>Barth, E.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vig, E.</au><au>Dorr, M.</au><au>Martinetz, T.</au><au>Barth, E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2012-06-01</date><risdate>2012</risdate><volume>34</volume><issue>6</issue><spage>1080</spage><epage>1091</epage><pages>1080-1091</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.</abstract><cop>Los Alamitos, CA</cop><pub>IEEE</pub><pmid>22516647</pmid><doi>10.1109/TPAMI.2011.198</doi><tpages>12</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2012-06, Vol.34 (6), p.1080-1091
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_proquest_miscellaneous_1671438703
source IEEE Electronic Library (IEL)
subjects Algorithms
Applied sciences
Artificial intelligence
Biological and medical sciences
Biological system modeling
Coding
Complexity
Computational modeling
Computational models of vision
Computer science
control theory
systems
computer vision
Exact sciences and technology
eye movement prediction
Eye movements
Eye Movements - physiology
Feature extraction
Fundamental and applied biological sciences. Psychology
Humans
Image color analysis
Intelligence
interest point detection
intrinsic dimension
Mathematical models
Pattern analysis
Pattern Recognition, Visual
Pattern recognition. Digital image processing. Computational geometry
Perception
Predictive models
Principal Component Analysis
Psychology. Psychoanalysis. Psychiatry
Psychology. Psychophysiology
Representations
spatiotemporal saliency
video analysis
Videos
Vision
Vision, Ocular - physiology
visual attention
Visual Perception
Visualization
title Intrinsic Dimensionality Predicts the Saliency of Natural Dynamic Scenes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T22%3A42%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Intrinsic%20Dimensionality%20Predicts%20the%20Saliency%20of%20Natural%20Dynamic%20Scenes&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Vig,%20E.&rft.date=2012-06-01&rft.volume=34&rft.issue=6&rft.spage=1080&rft.epage=1091&rft.pages=1080-1091&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2011.198&rft_dat=%3Cproquest_RIE%3E1671438703%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1009116493&rft_id=info:pmid/22516647&rft_ieee_id=6042873&rfr_iscdi=true