Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification

Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2021, Vol.28, p.1983-1987
Hauptverfasser: Miao, Zhuang, Zhao, Xun, Wang, Jiabao, Li, Yang, Li, Hang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1987
container_issue
container_start_page 1983
container_title IEEE signal processing letters
container_volume 28
creator Miao, Zhuang
Zhao, Xun
Wang, Jiabao
Li, Yang
Li, Hang
description Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance.
doi_str_mv 10.1109/LSP.2021.3114622
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2582246966</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9546643</ieee_id><sourcerecordid>2582246966</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</originalsourceid><addsrcrecordid>eNo9kN1LwzAUxYMoOKfvgi8FnztvbpqsfRzFTWF-wPQ5pNkNdHbtTFLE_96WDZ_O4XLOufBj7JbDjHMoHtab9xkC8pngPFOIZ2zCpcxTFIqfDx7mkBYF5JfsKoQdAOQ8lxO2Kbv9oaE9tdE0ySLGwdRdm7z0TazTJZnYe0qWfRiPrxR_Ov-VuM4ny7qldOXNINukbEwItautGcvX7MKZJtDNSafsc_n4UT6l67fVc7lYp1YIEVMjheRgXSZ5ISGjAriyIOfoHLmsqiQaCYCUq2qLFnnlBFqao7DCVmRyMWX3x92D7757ClHvut63w0uNMkfMVKHUkIJjyvouBE9OH3y9N_5Xc9AjOj2g0yM6fUI3VO6OlZqI_uOFzJTKhPgD9kxqQQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2582246966</pqid></control><display><type>article</type><title>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Miao, Zhuang ; Zhao, Xun ; Wang, Jiabao ; Li, Yang ; Li, Hang</creator><creatorcontrib>Miao, Zhuang ; Zhao, Xun ; Wang, Jiabao ; Li, Yang ; Li, Hang</creatorcontrib><description>Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2021.3114622</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Ablation ; Attention ; Automobiles ; Classification ; Dogs ; Feature extraction ; fine-grained classification ; Image classification ; Loss measurement ; Modules ; Task analysis ; Training ; transformer ; Transformers</subject><ispartof>IEEE signal processing letters, 2021, Vol.28, p.1983-1987</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</citedby><cites>FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</cites><orcidid>0000-0003-1682-0284 ; 0000-0002-1767-6520 ; 0000-0002-3706-9912 ; 0000-0003-2289-4589</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9546643$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4009,27902,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9546643$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Miao, Zhuang</creatorcontrib><creatorcontrib>Zhao, Xun</creatorcontrib><creatorcontrib>Wang, Jiabao</creatorcontrib><creatorcontrib>Li, Yang</creatorcontrib><creatorcontrib>Li, Hang</creatorcontrib><title>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance.</description><subject>Ablation</subject><subject>Attention</subject><subject>Automobiles</subject><subject>Classification</subject><subject>Dogs</subject><subject>Feature extraction</subject><subject>fine-grained classification</subject><subject>Image classification</subject><subject>Loss measurement</subject><subject>Modules</subject><subject>Task analysis</subject><subject>Training</subject><subject>transformer</subject><subject>Transformers</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kN1LwzAUxYMoOKfvgi8FnztvbpqsfRzFTWF-wPQ5pNkNdHbtTFLE_96WDZ_O4XLOufBj7JbDjHMoHtab9xkC8pngPFOIZ2zCpcxTFIqfDx7mkBYF5JfsKoQdAOQ8lxO2Kbv9oaE9tdE0ySLGwdRdm7z0TazTJZnYe0qWfRiPrxR_Ov-VuM4ny7qldOXNINukbEwItautGcvX7MKZJtDNSafsc_n4UT6l67fVc7lYp1YIEVMjheRgXSZ5ISGjAriyIOfoHLmsqiQaCYCUq2qLFnnlBFqao7DCVmRyMWX3x92D7757ClHvut63w0uNMkfMVKHUkIJjyvouBE9OH3y9N_5Xc9AjOj2g0yM6fUI3VO6OlZqI_uOFzJTKhPgD9kxqQQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Miao, Zhuang</creator><creator>Zhao, Xun</creator><creator>Wang, Jiabao</creator><creator>Li, Yang</creator><creator>Li, Hang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1682-0284</orcidid><orcidid>https://orcid.org/0000-0002-1767-6520</orcidid><orcidid>https://orcid.org/0000-0002-3706-9912</orcidid><orcidid>https://orcid.org/0000-0003-2289-4589</orcidid></search><sort><creationdate>2021</creationdate><title>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</title><author>Miao, Zhuang ; Zhao, Xun ; Wang, Jiabao ; Li, Yang ; Li, Hang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Ablation</topic><topic>Attention</topic><topic>Automobiles</topic><topic>Classification</topic><topic>Dogs</topic><topic>Feature extraction</topic><topic>fine-grained classification</topic><topic>Image classification</topic><topic>Loss measurement</topic><topic>Modules</topic><topic>Task analysis</topic><topic>Training</topic><topic>transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Miao, Zhuang</creatorcontrib><creatorcontrib>Zhao, Xun</creatorcontrib><creatorcontrib>Wang, Jiabao</creatorcontrib><creatorcontrib>Li, Yang</creatorcontrib><creatorcontrib>Li, Hang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Miao, Zhuang</au><au>Zhao, Xun</au><au>Wang, Jiabao</au><au>Li, Yang</au><au>Li, Hang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2021</date><risdate>2021</risdate><volume>28</volume><spage>1983</spage><epage>1987</epage><pages>1983-1987</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2021.3114622</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-1682-0284</orcidid><orcidid>https://orcid.org/0000-0002-1767-6520</orcidid><orcidid>https://orcid.org/0000-0002-3706-9912</orcidid><orcidid>https://orcid.org/0000-0003-2289-4589</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1070-9908
ispartof IEEE signal processing letters, 2021, Vol.28, p.1983-1987
issn 1070-9908
1558-2361
language eng
recordid cdi_proquest_journals_2582246966
source IEEE Electronic Library (IEL)
subjects Ablation
Attention
Automobiles
Classification
Dogs
Feature extraction
fine-grained classification
Image classification
Loss measurement
Modules
Task analysis
Training
transformer
Transformers
title Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T08%3A57%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Complemental%20Attention%20Multi-Feature%20Fusion%20Network%20for%20Fine-Grained%20Classification&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Miao,%20Zhuang&rft.date=2021&rft.volume=28&rft.spage=1983&rft.epage=1987&rft.pages=1983-1987&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2021.3114622&rft_dat=%3Cproquest_RIE%3E2582246966%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2582246966&rft_id=info:pmid/&rft_ieee_id=9546643&rfr_iscdi=true