CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions

Deep convolutional neural networks (CNNs) have revolutionized the computer vision research and have seen unprecedented adoption for multiple tasks, such as classification, detection, and caption generation. However, they offer little transparency into their inner workings and are often treated as bl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2019-05, Vol.28 (5), p.2116-2125
Hauptverfasser: Mopuri, Konda Reddy, Garg, Utsav, Venkatesh Babu, R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2125
container_issue 5
container_start_page 2116
container_title IEEE transactions on image processing
container_volume 28
creator Mopuri, Konda Reddy
Garg, Utsav
Venkatesh Babu, R.
description Deep convolutional neural networks (CNNs) have revolutionized the computer vision research and have seen unprecedented adoption for multiple tasks, such as classification, detection, and caption generation. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this paper, we aim at alleviating this opaqueness of CNNs by providing visual explanations for the network's predictions. Our approach can analyze a variety of CNN-based models trained for computer vision applications, such as object recognition and caption generation. Unlike the existing methods, we achieve this via unraveling the forward pass operation. The proposed method exploits feature dependencies across the layer hierarchy and uncovers the discriminative image locations that guide the network's predictions. We name these locations CNN fixations, loosely analogous to human eye fixations. Our approach is a generic method that requires no architectural changes, additional training, or gradient computation, and computes the important image locations (CNN fixations). We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks, and data modalities.
doi_str_mv 10.1109/TIP.2018.2881920
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8537979</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8537979</ieee_id><sourcerecordid>2136066102</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-4eca04c5659fc50997bb8fcc1298ef3cc023c91666ca2c4953c7aee6a51a7cb63</originalsourceid><addsrcrecordid>eNpdkM1LwzAchoMoOqd3QZCAFy-d-U7jbcyvgUzRzWvJ4q8zo2tn04r615uxuYOnBPK8L28ehE4o6VFKzOV4-NRjhKY9lqbUMLKDOtQImhAi2G68E6kTTYU5QIchzAmhQlK1jw44EZJxpTvoZTAa4Vv_ZRtfleEK90s8KWv7CYUvZ7i_XNaVde-4qfCrD60t_A_g5h3wtQ-u9gtfxuAn4OHCzgA_w2zVcoT2clsEON6cXTS5vRkP7pOHx7vhoP-QOC50kwhwlggnlTS5k8QYPZ2muXOUmRRy7hxh3BmqlHKWOWEkd9oCKCup1W6qeBddrHvjyI8WQpMt4iooCltC1YaMUa6IUjT2dNH5P3RetXUZ10VKGcENEzJSZE25ugqhhjxbxj_a-jujJFsJz6LwbCU82wiPkbNNcTtdwNs28Gc4AqdrwAPA9jmVXBtt-C_wFYMz</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2169439245</pqid></control><display><type>article</type><title>CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions</title><source>IEEE Electronic Library (IEL)</source><creator>Mopuri, Konda Reddy ; Garg, Utsav ; Venkatesh Babu, R.</creator><creatorcontrib>Mopuri, Konda Reddy ; Garg, Utsav ; Venkatesh Babu, R.</creatorcontrib><description>Deep convolutional neural networks (CNNs) have revolutionized the computer vision research and have seen unprecedented adoption for multiple tasks, such as classification, detection, and caption generation. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this paper, we aim at alleviating this opaqueness of CNNs by providing visual explanations for the network's predictions. Our approach can analyze a variety of CNN-based models trained for computer vision applications, such as object recognition and caption generation. Unlike the existing methods, we achieve this via unraveling the forward pass operation. The proposed method exploits feature dependencies across the layer hierarchy and uncovers the discriminative image locations that guide the network's predictions. We name these locations CNN fixations, loosely analogous to human eye fixations. Our approach is a generic method that requires no architectural changes, additional training, or gradient computation, and computes the important image locations (CNN fixations). We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks, and data modalities.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2018.2881920</identifier><identifier>PMID: 30452367</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; Black boxes ; CNN visualization ; Computer architecture ; Computer vision ; Convolution ; Explainable AI ; label localization ; Network architecture ; Neurons ; Object recognition ; Task analysis ; Training ; visual explanations ; Visualization ; weakly supervised localization</subject><ispartof>IEEE transactions on image processing, 2019-05, Vol.28 (5), p.2116-2125</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-4eca04c5659fc50997bb8fcc1298ef3cc023c91666ca2c4953c7aee6a51a7cb63</citedby><cites>FETCH-LOGICAL-c347t-4eca04c5659fc50997bb8fcc1298ef3cc023c91666ca2c4953c7aee6a51a7cb63</cites><orcidid>0000-0001-8894-7212 ; 0000-0002-1926-1804</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8537979$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8537979$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30452367$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mopuri, Konda Reddy</creatorcontrib><creatorcontrib>Garg, Utsav</creatorcontrib><creatorcontrib>Venkatesh Babu, R.</creatorcontrib><title>CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Deep convolutional neural networks (CNNs) have revolutionized the computer vision research and have seen unprecedented adoption for multiple tasks, such as classification, detection, and caption generation. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this paper, we aim at alleviating this opaqueness of CNNs by providing visual explanations for the network's predictions. Our approach can analyze a variety of CNN-based models trained for computer vision applications, such as object recognition and caption generation. Unlike the existing methods, we achieve this via unraveling the forward pass operation. The proposed method exploits feature dependencies across the layer hierarchy and uncovers the discriminative image locations that guide the network's predictions. We name these locations CNN fixations, loosely analogous to human eye fixations. Our approach is a generic method that requires no architectural changes, additional training, or gradient computation, and computes the important image locations (CNN fixations). We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks, and data modalities.</description><subject>Artificial neural networks</subject><subject>Black boxes</subject><subject>CNN visualization</subject><subject>Computer architecture</subject><subject>Computer vision</subject><subject>Convolution</subject><subject>Explainable AI</subject><subject>label localization</subject><subject>Network architecture</subject><subject>Neurons</subject><subject>Object recognition</subject><subject>Task analysis</subject><subject>Training</subject><subject>visual explanations</subject><subject>Visualization</subject><subject>weakly supervised localization</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkM1LwzAchoMoOqd3QZCAFy-d-U7jbcyvgUzRzWvJ4q8zo2tn04r615uxuYOnBPK8L28ehE4o6VFKzOV4-NRjhKY9lqbUMLKDOtQImhAi2G68E6kTTYU5QIchzAmhQlK1jw44EZJxpTvoZTAa4Vv_ZRtfleEK90s8KWv7CYUvZ7i_XNaVde-4qfCrD60t_A_g5h3wtQ-u9gtfxuAn4OHCzgA_w2zVcoT2clsEON6cXTS5vRkP7pOHx7vhoP-QOC50kwhwlggnlTS5k8QYPZ2muXOUmRRy7hxh3BmqlHKWOWEkd9oCKCup1W6qeBddrHvjyI8WQpMt4iooCltC1YaMUa6IUjT2dNH5P3RetXUZ10VKGcENEzJSZE25ugqhhjxbxj_a-jujJFsJz6LwbCU82wiPkbNNcTtdwNs28Gc4AqdrwAPA9jmVXBtt-C_wFYMz</recordid><startdate>20190501</startdate><enddate>20190501</enddate><creator>Mopuri, Konda Reddy</creator><creator>Garg, Utsav</creator><creator>Venkatesh Babu, R.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8894-7212</orcidid><orcidid>https://orcid.org/0000-0002-1926-1804</orcidid></search><sort><creationdate>20190501</creationdate><title>CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions</title><author>Mopuri, Konda Reddy ; Garg, Utsav ; Venkatesh Babu, R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-4eca04c5659fc50997bb8fcc1298ef3cc023c91666ca2c4953c7aee6a51a7cb63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Artificial neural networks</topic><topic>Black boxes</topic><topic>CNN visualization</topic><topic>Computer architecture</topic><topic>Computer vision</topic><topic>Convolution</topic><topic>Explainable AI</topic><topic>label localization</topic><topic>Network architecture</topic><topic>Neurons</topic><topic>Object recognition</topic><topic>Task analysis</topic><topic>Training</topic><topic>visual explanations</topic><topic>Visualization</topic><topic>weakly supervised localization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mopuri, Konda Reddy</creatorcontrib><creatorcontrib>Garg, Utsav</creatorcontrib><creatorcontrib>Venkatesh Babu, R.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mopuri, Konda Reddy</au><au>Garg, Utsav</au><au>Venkatesh Babu, R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2019-05-01</date><risdate>2019</risdate><volume>28</volume><issue>5</issue><spage>2116</spage><epage>2125</epage><pages>2116-2125</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Deep convolutional neural networks (CNNs) have revolutionized the computer vision research and have seen unprecedented adoption for multiple tasks, such as classification, detection, and caption generation. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this paper, we aim at alleviating this opaqueness of CNNs by providing visual explanations for the network's predictions. Our approach can analyze a variety of CNN-based models trained for computer vision applications, such as object recognition and caption generation. Unlike the existing methods, we achieve this via unraveling the forward pass operation. The proposed method exploits feature dependencies across the layer hierarchy and uncovers the discriminative image locations that guide the network's predictions. We name these locations CNN fixations, loosely analogous to human eye fixations. Our approach is a generic method that requires no architectural changes, additional training, or gradient computation, and computes the important image locations (CNN fixations). We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks, and data modalities.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30452367</pmid><doi>10.1109/TIP.2018.2881920</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-8894-7212</orcidid><orcidid>https://orcid.org/0000-0002-1926-1804</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2019-05, Vol.28 (5), p.2116-2125
issn 1057-7149
1941-0042
language eng
recordid cdi_ieee_primary_8537979
source IEEE Electronic Library (IEL)
subjects Artificial neural networks
Black boxes
CNN visualization
Computer architecture
Computer vision
Convolution
Explainable AI
label localization
Network architecture
Neurons
Object recognition
Task analysis
Training
visual explanations
Visualization
weakly supervised localization
title CNN Fixations: An Unraveling Approach to Visualize the Discriminative Image Regions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T03%3A33%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CNN%20Fixations:%20An%20Unraveling%20Approach%20to%20Visualize%20the%20Discriminative%20Image%20Regions&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Mopuri,%20Konda%20Reddy&rft.date=2019-05-01&rft.volume=28&rft.issue=5&rft.spage=2116&rft.epage=2125&rft.pages=2116-2125&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2018.2881920&rft_dat=%3Cproquest_RIE%3E2136066102%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2169439245&rft_id=info:pmid/30452367&rft_ieee_id=8537979&rfr_iscdi=true