Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification
Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays a...
Gespeichert in:
Veröffentlicht in: | IEEE signal processing letters 2021, Vol.28, p.1983-1987 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1987 |
---|---|
container_issue | |
container_start_page | 1983 |
container_title | IEEE signal processing letters |
container_volume | 28 |
creator | Miao, Zhuang Zhao, Xun Wang, Jiabao Li, Yang Li, Hang |
description | Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance. |
doi_str_mv | 10.1109/LSP.2021.3114622 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2582246966</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9546643</ieee_id><sourcerecordid>2582246966</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</originalsourceid><addsrcrecordid>eNo9kN1LwzAUxYMoOKfvgi8FnztvbpqsfRzFTWF-wPQ5pNkNdHbtTFLE_96WDZ_O4XLOufBj7JbDjHMoHtab9xkC8pngPFOIZ2zCpcxTFIqfDx7mkBYF5JfsKoQdAOQ8lxO2Kbv9oaE9tdE0ySLGwdRdm7z0TazTJZnYe0qWfRiPrxR_Ov-VuM4ny7qldOXNINukbEwItautGcvX7MKZJtDNSafsc_n4UT6l67fVc7lYp1YIEVMjheRgXSZ5ISGjAriyIOfoHLmsqiQaCYCUq2qLFnnlBFqao7DCVmRyMWX3x92D7757ClHvut63w0uNMkfMVKHUkIJjyvouBE9OH3y9N_5Xc9AjOj2g0yM6fUI3VO6OlZqI_uOFzJTKhPgD9kxqQQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2582246966</pqid></control><display><type>article</type><title>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Miao, Zhuang ; Zhao, Xun ; Wang, Jiabao ; Li, Yang ; Li, Hang</creator><creatorcontrib>Miao, Zhuang ; Zhao, Xun ; Wang, Jiabao ; Li, Yang ; Li, Hang</creatorcontrib><description>Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2021.3114622</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Ablation ; Attention ; Automobiles ; Classification ; Dogs ; Feature extraction ; fine-grained classification ; Image classification ; Loss measurement ; Modules ; Task analysis ; Training ; transformer ; Transformers</subject><ispartof>IEEE signal processing letters, 2021, Vol.28, p.1983-1987</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</citedby><cites>FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</cites><orcidid>0000-0003-1682-0284 ; 0000-0002-1767-6520 ; 0000-0002-3706-9912 ; 0000-0003-2289-4589</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9546643$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4009,27902,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9546643$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Miao, Zhuang</creatorcontrib><creatorcontrib>Zhao, Xun</creatorcontrib><creatorcontrib>Wang, Jiabao</creatorcontrib><creatorcontrib>Li, Yang</creatorcontrib><creatorcontrib>Li, Hang</creatorcontrib><title>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance.</description><subject>Ablation</subject><subject>Attention</subject><subject>Automobiles</subject><subject>Classification</subject><subject>Dogs</subject><subject>Feature extraction</subject><subject>fine-grained classification</subject><subject>Image classification</subject><subject>Loss measurement</subject><subject>Modules</subject><subject>Task analysis</subject><subject>Training</subject><subject>transformer</subject><subject>Transformers</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kN1LwzAUxYMoOKfvgi8FnztvbpqsfRzFTWF-wPQ5pNkNdHbtTFLE_96WDZ_O4XLOufBj7JbDjHMoHtab9xkC8pngPFOIZ2zCpcxTFIqfDx7mkBYF5JfsKoQdAOQ8lxO2Kbv9oaE9tdE0ySLGwdRdm7z0TazTJZnYe0qWfRiPrxR_Ov-VuM4ny7qldOXNINukbEwItautGcvX7MKZJtDNSafsc_n4UT6l67fVc7lYp1YIEVMjheRgXSZ5ISGjAriyIOfoHLmsqiQaCYCUq2qLFnnlBFqao7DCVmRyMWX3x92D7757ClHvut63w0uNMkfMVKHUkIJjyvouBE9OH3y9N_5Xc9AjOj2g0yM6fUI3VO6OlZqI_uOFzJTKhPgD9kxqQQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Miao, Zhuang</creator><creator>Zhao, Xun</creator><creator>Wang, Jiabao</creator><creator>Li, Yang</creator><creator>Li, Hang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1682-0284</orcidid><orcidid>https://orcid.org/0000-0002-1767-6520</orcidid><orcidid>https://orcid.org/0000-0002-3706-9912</orcidid><orcidid>https://orcid.org/0000-0003-2289-4589</orcidid></search><sort><creationdate>2021</creationdate><title>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</title><author>Miao, Zhuang ; Zhao, Xun ; Wang, Jiabao ; Li, Yang ; Li, Hang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-a53510cf4519504e9016c0572ffef4bb52a5002e86bd2c21bf32ce723c3cbea83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Ablation</topic><topic>Attention</topic><topic>Automobiles</topic><topic>Classification</topic><topic>Dogs</topic><topic>Feature extraction</topic><topic>fine-grained classification</topic><topic>Image classification</topic><topic>Loss measurement</topic><topic>Modules</topic><topic>Task analysis</topic><topic>Training</topic><topic>transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Miao, Zhuang</creatorcontrib><creatorcontrib>Zhao, Xun</creatorcontrib><creatorcontrib>Wang, Jiabao</creatorcontrib><creatorcontrib>Li, Yang</creatorcontrib><creatorcontrib>Li, Hang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Miao, Zhuang</au><au>Zhao, Xun</au><au>Wang, Jiabao</au><au>Li, Yang</au><au>Li, Hang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2021</date><risdate>2021</risdate><volume>28</volume><spage>1983</spage><epage>1987</epage><pages>1983-1987</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>Transformer-based architecture network has shown excellent performance in the coarse-grained image classification. However, it remains a challenge for the fine-grained image classification task, which needs more significant regional information. As one of the attention mechanisms, transformer pays attention to the most significant region while neglecting other sub-significant regions. To use more regional information, in this letter, we propose a complemental attention multi-feature fusion network (CAMF), which extracts multiple attention features to obtain more effective features. In CAMF, we propose two novel modules: (i) a complemental attention module (CAM) that extracts the most salient attention feature and the complemental attention feature. (ii) a multi-feature fusion module (MFM) that uses different branches to extract multiple regional discriminative features. Furthermore, a new feature similarity loss is proposed to measure the diversity of inter-class features. Experiments were conducted on four public fine-grained classification datasets. Our CAMF achieves 91.2%, 92.8%, 93.3%, 95.3% on CUB-200-2011, Stanford Dogs, FGVC-Aircraft, and Stanford Cars. The ablation study verified that CAM and MFM can focus on more local discriminative regions and improve fine-grained classification performance.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2021.3114622</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-1682-0284</orcidid><orcidid>https://orcid.org/0000-0002-1767-6520</orcidid><orcidid>https://orcid.org/0000-0002-3706-9912</orcidid><orcidid>https://orcid.org/0000-0003-2289-4589</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1070-9908 |
ispartof | IEEE signal processing letters, 2021, Vol.28, p.1983-1987 |
issn | 1070-9908 1558-2361 |
language | eng |
recordid | cdi_proquest_journals_2582246966 |
source | IEEE Electronic Library (IEL) |
subjects | Ablation Attention Automobiles Classification Dogs Feature extraction fine-grained classification Image classification Loss measurement Modules Task analysis Training transformer Transformers |
title | Complemental Attention Multi-Feature Fusion Network for Fine-Grained Classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T08%3A57%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Complemental%20Attention%20Multi-Feature%20Fusion%20Network%20for%20Fine-Grained%20Classification&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Miao,%20Zhuang&rft.date=2021&rft.volume=28&rft.spage=1983&rft.epage=1987&rft.pages=1983-1987&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2021.3114622&rft_dat=%3Cproquest_RIE%3E2582246966%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2582246966&rft_id=info:pmid/&rft_ieee_id=9546643&rfr_iscdi=true |