MSANet: Multi-scale attention networks for image classification

The classification of images based on the principles of human vision is a major task in the field of computer vision. It is a common method to use multi-scale information and attention mechanism to obtain better classification performance. The methods based on multi-scale can obtain more accurate fe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2022-10, Vol.81 (24), p.34325-34344
Hauptverfasser:	Cao, Ping, Xie, Fangxin, Zhang, Shichao, Zhang, Zuping, Zhang, Jianfeng
Format:	Artikel
Sprache:	eng
Schlagworte:	1168: Deep Pattern Discovery for Big Multimedia Data Accuracy Artificial neural networks Classification Computer Communication Networks Computer Science Computer vision Data Structures and Information Theory Feature extraction Feature maps Image acquisition Image classification Machine learning Modules Multimedia Information Systems Special Purpose and Application-Based Systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	34344
container_issue	24
container_start_page	34325
container_title	Multimedia tools and applications
container_volume	81
creator	Cao, Ping Xie, Fangxin Zhang, Shichao Zhang, Zuping Zhang, Jianfeng
description	The classification of images based on the principles of human vision is a major task in the field of computer vision. It is a common method to use multi-scale information and attention mechanism to obtain better classification performance. The methods based on multi-scale can obtain more accurate feature description by fusing different levels of information, and the methods based on attention can make the deep learning models focus on more valuable information in the image. However, the current methods usually treat the acquisition of multi-scale feature maps and the acquisition of attention weights as two separate steps in sequence. Since human eyes usually use these two methods at the same time when observing objects, we propose a multi-scale attention (MSA) module. The proposed MSA module directly extracts the attention information of different scales from a feature map, that is, the multi-scale and attention methods are simultaneously completed in one step. In the MSA module, we obtain different scales of channel and spatial attention by controlling the size of the convolution kernel for cross-channel and cross-space information interaction. Our module can be easily integrated into different convolutional neural networks to form Multi-scale attention networks (MSANet) architectures. We demonstrate the performance of MSANet on CIFAR-10 and CIFAR-100 data sets. In particular, the accuracy of our ResNet-110 based model on CIFAR-10 is 94.39%. Compared with the benchmark convolution model, our proposed multi-scale attention module can bring a roughly 3% increase in accuracy rate on CIFAR-100. Experimental results show that the proposed multi-scale attention module is superior in image classification.
doi_str_mv	10.1007/s11042-022-12792-5
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2716775816</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2716775816</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-aeadd9c23664b534f5f619467ab4c76dda28bd2a371cf4c0bd4e8855f7fe2eb3</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EEqXwB5giMRt8_kxYUFVRQGphoLvlOHaVEpJiu0L8e1yCxMZ0Nzzve6cHoUsg10CIuokAhFNMKMVAVUWxOEITEIphpSgc552VBCtB4BSdxbglBKSgfILuVq-zZ5dui9W-Sy2O1nSuMCm5PrVDX_QufQ7hLRZ-CEX7bjausJ2JsfWtNQfiHJ1400V38TunaL24X88f8fLl4Wk-W2LLoErYONM0laVMSl4Lxr3wEioulam5VbJpDC3rhhqmwHpuSd1wV5ZCeOUddTWboquxdheGj72LSW-HfejzRU0VSKVECTJTdKRsGGIMzutdyE-HLw1EHzzp0ZPOnvSPJy1yiI2hmOF-48Jf9T-pb1-xazg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2716775816</pqid></control><display><type>article</type><title>MSANet: Multi-scale attention networks for image classification</title><source>SpringerLink Journals</source><creator>Cao, Ping ; Xie, Fangxin ; Zhang, Shichao ; Zhang, Zuping ; Zhang, Jianfeng</creator><creatorcontrib>Cao, Ping ; Xie, Fangxin ; Zhang, Shichao ; Zhang, Zuping ; Zhang, Jianfeng</creatorcontrib><description>The classification of images based on the principles of human vision is a major task in the field of computer vision. It is a common method to use multi-scale information and attention mechanism to obtain better classification performance. The methods based on multi-scale can obtain more accurate feature description by fusing different levels of information, and the methods based on attention can make the deep learning models focus on more valuable information in the image. However, the current methods usually treat the acquisition of multi-scale feature maps and the acquisition of attention weights as two separate steps in sequence. Since human eyes usually use these two methods at the same time when observing objects, we propose a multi-scale attention (MSA) module. The proposed MSA module directly extracts the attention information of different scales from a feature map, that is, the multi-scale and attention methods are simultaneously completed in one step. In the MSA module, we obtain different scales of channel and spatial attention by controlling the size of the convolution kernel for cross-channel and cross-space information interaction. Our module can be easily integrated into different convolutional neural networks to form Multi-scale attention networks (MSANet) architectures. We demonstrate the performance of MSANet on CIFAR-10 and CIFAR-100 data sets. In particular, the accuracy of our ResNet-110 based model on CIFAR-10 is 94.39%. Compared with the benchmark convolution model, our proposed multi-scale attention module can bring a roughly 3% increase in accuracy rate on CIFAR-100. Experimental results show that the proposed multi-scale attention module is superior in image classification.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-022-12792-5</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>1168: Deep Pattern Discovery for Big Multimedia Data ; Accuracy ; Artificial neural networks ; Classification ; Computer Communication Networks ; Computer Science ; Computer vision ; Data Structures and Information Theory ; Feature extraction ; Feature maps ; Image acquisition ; Image classification ; Machine learning ; Modules ; Multimedia Information Systems ; Special Purpose and Application-Based Systems</subject><ispartof>Multimedia tools and applications, 2022-10, Vol.81 (24), p.34325-34344</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-aeadd9c23664b534f5f619467ab4c76dda28bd2a371cf4c0bd4e8855f7fe2eb3</citedby><cites>FETCH-LOGICAL-c319t-aeadd9c23664b534f5f619467ab4c76dda28bd2a371cf4c0bd4e8855f7fe2eb3</cites><orcidid>0000-0002-2528-7808</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-022-12792-5$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-022-12792-5$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Cao, Ping</creatorcontrib><creatorcontrib>Xie, Fangxin</creatorcontrib><creatorcontrib>Zhang, Shichao</creatorcontrib><creatorcontrib>Zhang, Zuping</creatorcontrib><creatorcontrib>Zhang, Jianfeng</creatorcontrib><title>MSANet: Multi-scale attention networks for image classification</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>The classification of images based on the principles of human vision is a major task in the field of computer vision. It is a common method to use multi-scale information and attention mechanism to obtain better classification performance. The methods based on multi-scale can obtain more accurate feature description by fusing different levels of information, and the methods based on attention can make the deep learning models focus on more valuable information in the image. However, the current methods usually treat the acquisition of multi-scale feature maps and the acquisition of attention weights as two separate steps in sequence. Since human eyes usually use these two methods at the same time when observing objects, we propose a multi-scale attention (MSA) module. The proposed MSA module directly extracts the attention information of different scales from a feature map, that is, the multi-scale and attention methods are simultaneously completed in one step. In the MSA module, we obtain different scales of channel and spatial attention by controlling the size of the convolution kernel for cross-channel and cross-space information interaction. Our module can be easily integrated into different convolutional neural networks to form Multi-scale attention networks (MSANet) architectures. We demonstrate the performance of MSANet on CIFAR-10 and CIFAR-100 data sets. In particular, the accuracy of our ResNet-110 based model on CIFAR-10 is 94.39%. Compared with the benchmark convolution model, our proposed multi-scale attention module can bring a roughly 3% increase in accuracy rate on CIFAR-100. Experimental results show that the proposed multi-scale attention module is superior in image classification.</description><subject>1168: Deep Pattern Discovery for Big Multimedia Data</subject><subject>Accuracy</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Data Structures and Information Theory</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image acquisition</subject><subject>Image classification</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Multimedia Information Systems</subject><subject>Special Purpose and Application-Based Systems</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9kD1PwzAQhi0EEqXwB5giMRt8_kxYUFVRQGphoLvlOHaVEpJiu0L8e1yCxMZ0Nzzve6cHoUsg10CIuokAhFNMKMVAVUWxOEITEIphpSgc552VBCtB4BSdxbglBKSgfILuVq-zZ5dui9W-Sy2O1nSuMCm5PrVDX_QufQ7hLRZ-CEX7bjausJ2JsfWtNQfiHJ1400V38TunaL24X88f8fLl4Wk-W2LLoErYONM0laVMSl4Lxr3wEioulam5VbJpDC3rhhqmwHpuSd1wV5ZCeOUddTWboquxdheGj72LSW-HfejzRU0VSKVECTJTdKRsGGIMzutdyE-HLw1EHzzp0ZPOnvSPJy1yiI2hmOF-48Jf9T-pb1-xazg</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Cao, Ping</creator><creator>Xie, Fangxin</creator><creator>Zhang, Shichao</creator><creator>Zhang, Zuping</creator><creator>Zhang, Jianfeng</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-2528-7808</orcidid></search><sort><creationdate>20221001</creationdate><title>MSANet: Multi-scale attention networks for image classification</title><author>Cao, Ping ; Xie, Fangxin ; Zhang, Shichao ; Zhang, Zuping ; Zhang, Jianfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-aeadd9c23664b534f5f619467ab4c76dda28bd2a371cf4c0bd4e8855f7fe2eb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>1168: Deep Pattern Discovery for Big Multimedia Data</topic><topic>Accuracy</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Data Structures and Information Theory</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image acquisition</topic><topic>Image classification</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Multimedia Information Systems</topic><topic>Special Purpose and Application-Based Systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Ping</creatorcontrib><creatorcontrib>Xie, Fangxin</creatorcontrib><creatorcontrib>Zhang, Shichao</creatorcontrib><creatorcontrib>Zhang, Zuping</creatorcontrib><creatorcontrib>Zhang, Jianfeng</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cao, Ping</au><au>Xie, Fangxin</au><au>Zhang, Shichao</au><au>Zhang, Zuping</au><au>Zhang, Jianfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MSANet: Multi-scale attention networks for image classification</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>81</volume><issue>24</issue><spage>34325</spage><epage>34344</epage><pages>34325-34344</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>The classification of images based on the principles of human vision is a major task in the field of computer vision. It is a common method to use multi-scale information and attention mechanism to obtain better classification performance. The methods based on multi-scale can obtain more accurate feature description by fusing different levels of information, and the methods based on attention can make the deep learning models focus on more valuable information in the image. However, the current methods usually treat the acquisition of multi-scale feature maps and the acquisition of attention weights as two separate steps in sequence. Since human eyes usually use these two methods at the same time when observing objects, we propose a multi-scale attention (MSA) module. The proposed MSA module directly extracts the attention information of different scales from a feature map, that is, the multi-scale and attention methods are simultaneously completed in one step. In the MSA module, we obtain different scales of channel and spatial attention by controlling the size of the convolution kernel for cross-channel and cross-space information interaction. Our module can be easily integrated into different convolutional neural networks to form Multi-scale attention networks (MSANet) architectures. We demonstrate the performance of MSANet on CIFAR-10 and CIFAR-100 data sets. In particular, the accuracy of our ResNet-110 based model on CIFAR-10 is 94.39%. Compared with the benchmark convolution model, our proposed multi-scale attention module can bring a roughly 3% increase in accuracy rate on CIFAR-100. Experimental results show that the proposed multi-scale attention module is superior in image classification.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-022-12792-5</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0002-2528-7808</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1380-7501
ispartof	Multimedia tools and applications, 2022-10, Vol.81 (24), p.34325-34344
issn	1380-7501 1573-7721
language	eng
recordid	cdi_proquest_journals_2716775816
source	SpringerLink Journals
subjects	1168: Deep Pattern Discovery for Big Multimedia Data Accuracy Artificial neural networks Classification Computer Communication Networks Computer Science Computer vision Data Structures and Information Theory Feature extraction Feature maps Image acquisition Image classification Machine learning Modules Multimedia Information Systems Special Purpose and Application-Based Systems
title	MSANet: Multi-scale attention networks for image classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T14%3A52%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MSANet:%20Multi-scale%20attention%20networks%20for%20image%20classification&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Cao,%20Ping&rft.date=2022-10-01&rft.volume=81&rft.issue=24&rft.spage=34325&rft.epage=34344&rft.pages=34325-34344&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-022-12792-5&rft_dat=%3Cproquest_cross%3E2716775816%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2716775816&rft_id=info:pmid/&rfr_iscdi=true