Attention mechanisms in computer vision: A survey

Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adju...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computational Visual Media 2022-09, Vol.8 (3), p.331-368
Hauptverfasser:	Guo, Meng-Hao, Xu, Tian-Xing, Liu, Jiang-Jiang, Liu, Zheng-Ning, Jiang, Peng-Tao, Mu, Tai-Jiang, Zhang, Song-Hai, Martin, Ralph R., Cheng, Ming-Ming, Hu, Shi-Min
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computational linguistics Computer Graphics Computer Science Computer vision Deep learning Image classification Image processing Image Processing and Computer Vision Image segmentation Language processing Machine vision Natural language interfaces Neural networks Object recognition Review Article Semantics Surveys User Interfaces and Human Computer Interaction Video data Visual aspects Visual tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	368
container_issue	3
container_start_page	331
container_title	Computational Visual Media
container_volume	8
creator	Guo, Meng-Hao Xu, Tian-Xing Liu, Jiang-Jiang Liu, Zheng-Ning Jiang, Peng-Tao Mu, Tai-Jiang Zhang, Song-Hai Martin, Ralph R. Cheng, Ming-Ming Hu, Shi-Min
description	Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
doi_str_mv	10.1007/s41095-022-0271-y
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2652733969</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A705698822</galeid><sourcerecordid>A705698822</sourcerecordid><originalsourceid>FETCH-LOGICAL-c498t-48a9ba641b7e8b6dad2d467de1a81978b6eee9060004a4501d557f2a94e97b4d3</originalsourceid><addsrcrecordid>eNp1kFFLwzAQx4MoOOY-gG8FnzsvaZo0vpWhThj4os8hba8zsqYzaQf99mZU8UnCkePu_7s7_oTcUlhTAHkfOAWVp8BYDEnT6YIsGCiRghDs8jfnWXZNViHYCpjICqBSLQgthwHdYHuXdFh_GGdDFxLrkrrvjuOAPjnZELsPSZmE0Z9wuiFXrTkEXP38S_L-9Pi22aa71-eXTblLa66KIeWFUZURnFYSi0o0pmENF7JBagqqZCwhogIBANzwHGiT57JlRnFUsuJNtiR389yj779GDIP-7Efv4krNRM5klimhomo9q_bmgNq6th-8qeNrsLN177C1sV5KyIUqCsYiQGeg9n0IHlt99LYzftIU9NlNPbupo5v67KaeIsNmJkSt26P_O-V_6BuzxXZc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2652733969</pqid></control><display><type>article</type><title>Attention mechanisms in computer vision: A survey</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Springer Nature OA Free Journals</source><creator>Guo, Meng-Hao ; Xu, Tian-Xing ; Liu, Jiang-Jiang ; Liu, Zheng-Ning ; Jiang, Peng-Tao ; Mu, Tai-Jiang ; Zhang, Song-Hai ; Martin, Ralph R. ; Cheng, Ming-Ming ; Hu, Shi-Min</creator><creatorcontrib>Guo, Meng-Hao ; Xu, Tian-Xing ; Liu, Jiang-Jiang ; Liu, Zheng-Ning ; Jiang, Peng-Tao ; Mu, Tai-Jiang ; Zhang, Song-Hai ; Martin, Ralph R. ; Cheng, Ming-Ming ; Hu, Shi-Min</creatorcontrib><description>Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.</description><identifier>ISSN: 2096-0433</identifier><identifier>EISSN: 2096-0662</identifier><identifier>DOI: 10.1007/s41095-022-0271-y</identifier><language>eng</language><publisher>Beijing: Tsinghua University Press</publisher><subject>Artificial Intelligence ; Computational linguistics ; Computer Graphics ; Computer Science ; Computer vision ; Deep learning ; Image classification ; Image processing ; Image Processing and Computer Vision ; Image segmentation ; Language processing ; Machine vision ; Natural language interfaces ; Neural networks ; Object recognition ; Review Article ; Semantics ; Surveys ; User Interfaces and Human Computer Interaction ; Video data ; Visual aspects ; Visual tasks</subject><ispartof>Computational Visual Media, 2022-09, Vol.8 (3), p.331-368</ispartof><rights>The Author(s) 2022</rights><rights>COPYRIGHT 2022 Springer</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c498t-48a9ba641b7e8b6dad2d467de1a81978b6eee9060004a4501d557f2a94e97b4d3</citedby><cites>FETCH-LOGICAL-c498t-48a9ba641b7e8b6dad2d467de1a81978b6eee9060004a4501d557f2a94e97b4d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s41095-022-0271-y$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://doi.org/10.1007/s41095-022-0271-y$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,27901,27902,41096,42165,51551</link.rule.ids></links><search><creatorcontrib>Guo, Meng-Hao</creatorcontrib><creatorcontrib>Xu, Tian-Xing</creatorcontrib><creatorcontrib>Liu, Jiang-Jiang</creatorcontrib><creatorcontrib>Liu, Zheng-Ning</creatorcontrib><creatorcontrib>Jiang, Peng-Tao</creatorcontrib><creatorcontrib>Mu, Tai-Jiang</creatorcontrib><creatorcontrib>Zhang, Song-Hai</creatorcontrib><creatorcontrib>Martin, Ralph R.</creatorcontrib><creatorcontrib>Cheng, Ming-Ming</creatorcontrib><creatorcontrib>Hu, Shi-Min</creatorcontrib><title>Attention mechanisms in computer vision: A survey</title><title>Computational Visual Media</title><addtitle>Comp. Visual Media</addtitle><description>Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.</description><subject>Artificial Intelligence</subject><subject>Computational linguistics</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Deep learning</subject><subject>Image classification</subject><subject>Image processing</subject><subject>Image Processing and Computer Vision</subject><subject>Image segmentation</subject><subject>Language processing</subject><subject>Machine vision</subject><subject>Natural language interfaces</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Review Article</subject><subject>Semantics</subject><subject>Surveys</subject><subject>User Interfaces and Human Computer Interaction</subject><subject>Video data</subject><subject>Visual aspects</subject><subject>Visual tasks</subject><issn>2096-0433</issn><issn>2096-0662</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>BENPR</sourceid><recordid>eNp1kFFLwzAQx4MoOOY-gG8FnzsvaZo0vpWhThj4os8hba8zsqYzaQf99mZU8UnCkePu_7s7_oTcUlhTAHkfOAWVp8BYDEnT6YIsGCiRghDs8jfnWXZNViHYCpjICqBSLQgthwHdYHuXdFh_GGdDFxLrkrrvjuOAPjnZELsPSZmE0Z9wuiFXrTkEXP38S_L-9Pi22aa71-eXTblLa66KIeWFUZURnFYSi0o0pmENF7JBagqqZCwhogIBANzwHGiT57JlRnFUsuJNtiR389yj779GDIP-7Efv4krNRM5klimhomo9q_bmgNq6th-8qeNrsLN177C1sV5KyIUqCsYiQGeg9n0IHlt99LYzftIU9NlNPbupo5v67KaeIsNmJkSt26P_O-V_6BuzxXZc</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Guo, Meng-Hao</creator><creator>Xu, Tian-Xing</creator><creator>Liu, Jiang-Jiang</creator><creator>Liu, Zheng-Ning</creator><creator>Jiang, Peng-Tao</creator><creator>Mu, Tai-Jiang</creator><creator>Zhang, Song-Hai</creator><creator>Martin, Ralph R.</creator><creator>Cheng, Ming-Ming</creator><creator>Hu, Shi-Min</creator><general>Tsinghua University Press</general><general>Springer</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IAO</scope><scope>7SC</scope><scope>8FD</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20220901</creationdate><title>Attention mechanisms in computer vision: A survey</title><author>Guo, Meng-Hao ; Xu, Tian-Xing ; Liu, Jiang-Jiang ; Liu, Zheng-Ning ; Jiang, Peng-Tao ; Mu, Tai-Jiang ; Zhang, Song-Hai ; Martin, Ralph R. ; Cheng, Ming-Ming ; Hu, Shi-Min</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c498t-48a9ba641b7e8b6dad2d467de1a81978b6eee9060004a4501d557f2a94e97b4d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial Intelligence</topic><topic>Computational linguistics</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Deep learning</topic><topic>Image classification</topic><topic>Image processing</topic><topic>Image Processing and Computer Vision</topic><topic>Image segmentation</topic><topic>Language processing</topic><topic>Machine vision</topic><topic>Natural language interfaces</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Review Article</topic><topic>Semantics</topic><topic>Surveys</topic><topic>User Interfaces and Human Computer Interaction</topic><topic>Video data</topic><topic>Visual aspects</topic><topic>Visual tasks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guo, Meng-Hao</creatorcontrib><creatorcontrib>Xu, Tian-Xing</creatorcontrib><creatorcontrib>Liu, Jiang-Jiang</creatorcontrib><creatorcontrib>Liu, Zheng-Ning</creatorcontrib><creatorcontrib>Jiang, Peng-Tao</creatorcontrib><creatorcontrib>Mu, Tai-Jiang</creatorcontrib><creatorcontrib>Zhang, Song-Hai</creatorcontrib><creatorcontrib>Martin, Ralph R.</creatorcontrib><creatorcontrib>Cheng, Ming-Ming</creatorcontrib><creatorcontrib>Hu, Shi-Min</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>Gale Academic OneFile</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Computational Visual Media</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guo, Meng-Hao</au><au>Xu, Tian-Xing</au><au>Liu, Jiang-Jiang</au><au>Liu, Zheng-Ning</au><au>Jiang, Peng-Tao</au><au>Mu, Tai-Jiang</au><au>Zhang, Song-Hai</au><au>Martin, Ralph R.</au><au>Cheng, Ming-Ming</au><au>Hu, Shi-Min</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Attention mechanisms in computer vision: A survey</atitle><jtitle>Computational Visual Media</jtitle><stitle>Comp. Visual Media</stitle><date>2022-09-01</date><risdate>2022</risdate><volume>8</volume><issue>3</issue><spage>331</spage><epage>368</epage><pages>331-368</pages><issn>2096-0433</issn><eissn>2096-0662</eissn><abstract>Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.</abstract><cop>Beijing</cop><pub>Tsinghua University Press</pub><doi>10.1007/s41095-022-0271-y</doi><tpages>38</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2096-0433
ispartof	Computational Visual Media, 2022-09, Vol.8 (3), p.331-368
issn	2096-0433 2096-0662
language	eng
recordid	cdi_proquest_journals_2652733969
source	DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Springer Nature OA Free Journals
subjects	Artificial Intelligence Computational linguistics Computer Graphics Computer Science Computer vision Deep learning Image classification Image processing Image Processing and Computer Vision Image segmentation Language processing Machine vision Natural language interfaces Neural networks Object recognition Review Article Semantics Surveys User Interfaces and Human Computer Interaction Video data Visual aspects Visual tasks
title	Attention mechanisms in computer vision: A survey
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T22%3A47%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Attention%20mechanisms%20in%20computer%20vision:%20A%20survey&rft.jtitle=Computational%20Visual%20Media&rft.au=Guo,%20Meng-Hao&rft.date=2022-09-01&rft.volume=8&rft.issue=3&rft.spage=331&rft.epage=368&rft.pages=331-368&rft.issn=2096-0433&rft.eissn=2096-0662&rft_id=info:doi/10.1007/s41095-022-0271-y&rft_dat=%3Cgale_proqu%3EA705698822%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2652733969&rft_id=info:pmid/&rft_galeid=A705698822&rfr_iscdi=true