Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks

Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2024-11, Vol.46 (11), p.7363-7376
Hauptverfasser: Han, Wencheng, Dong, Xingping, Zhang, Yiyuan, Crandall, David, Xu, Cheng-Zhong, Shen, Jianbing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 7376
container_issue 11
container_start_page 7363
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 46
creator Han, Wencheng
Dong, Xingping
Zhang, Yiyuan
Crandall, David
Xu, Cheng-Zhong
Shen, Jianbing
description Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.
doi_str_mv 10.1109/TPAMI.2024.3400873
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_3055452455</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10530458</ieee_id><sourcerecordid>3055452455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQhhdRbK3-ARHZo5fUTWa32XgLpa0Fgx6q17DJTnA1X2Y3Qv31praKp2HgeV9mHkIufTb1fRbdbp7iZD0NWMCnwBmTIRyRsR9B5IGA6JiMmT8LPCkDOSJn1r4x5nPB4JSMQIYcBBdjYmK7rSp0ncnpvKk_m7J3pqnvaFzTRVGY3GDtqKo1XWGNnSrNF2qaoHttNHUNXfYW6RKV6zukiWotNTVN-tKZtkT6YuxQRjfKvttzclKo0uLFYU7I83Kxmd97D4-r9Tx-8PIgFM4rJA8kV5HmIcMIeRgKlDrSADrLZrKQCnIRAgLjWchmUkZa8mHVGWPCBw4TcrPvbbvmo0fr0srYHMtS1dj0NgUmhs8DLsSABns07xprOyzStjOV6rapz9Kd4vRHcbpTnB4UD6HrQ3-fVaj_Ir9OB-BqDxhE_NcohpOFhG9uA3_C</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055452455</pqid></control><display><type>article</type><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><source>IEEE Electronic Library (IEL)</source><creator>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</creator><creatorcontrib>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</creatorcontrib><description>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</description><identifier>ISSN: 0162-8828</identifier><identifier>ISSN: 1939-3539</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2024.3400873</identifier><identifier>PMID: 38743545</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Asymmetric convolution ; Convolution ; Feature extraction ; feature maps ; Fuses ; fusing features ; Shape ; Target tracking ; Task analysis ; vision tasks ; Visualization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2024-11, Vol.46 (11), p.7363-7376</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</cites><orcidid>0000-0003-2656-3082 ; 0000-0003-1613-9288 ; 0000-0001-9480-0356 ; 0000-0001-6643-9698 ; 0000-0002-5827-5344</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10530458$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10530458$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38743545$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Wencheng</creatorcontrib><creatorcontrib>Dong, Xingping</creatorcontrib><creatorcontrib>Zhang, Yiyuan</creatorcontrib><creatorcontrib>Crandall, David</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</description><subject>Asymmetric convolution</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>feature maps</subject><subject>Fuses</subject><subject>fusing features</subject><subject>Shape</subject><subject>Target tracking</subject><subject>Task analysis</subject><subject>vision tasks</subject><subject>Visualization</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AQhhdRbK3-ARHZo5fUTWa32XgLpa0Fgx6q17DJTnA1X2Y3Qv31praKp2HgeV9mHkIufTb1fRbdbp7iZD0NWMCnwBmTIRyRsR9B5IGA6JiMmT8LPCkDOSJn1r4x5nPB4JSMQIYcBBdjYmK7rSp0ncnpvKk_m7J3pqnvaFzTRVGY3GDtqKo1XWGNnSrNF2qaoHttNHUNXfYW6RKV6zukiWotNTVN-tKZtkT6YuxQRjfKvttzclKo0uLFYU7I83Kxmd97D4-r9Tx-8PIgFM4rJA8kV5HmIcMIeRgKlDrSADrLZrKQCnIRAgLjWchmUkZa8mHVGWPCBw4TcrPvbbvmo0fr0srYHMtS1dj0NgUmhs8DLsSABns07xprOyzStjOV6rapz9Kd4vRHcbpTnB4UD6HrQ3-fVaj_Ir9OB-BqDxhE_NcohpOFhG9uA3_C</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Han, Wencheng</creator><creator>Dong, Xingping</creator><creator>Zhang, Yiyuan</creator><creator>Crandall, David</creator><creator>Xu, Cheng-Zhong</creator><creator>Shen, Jianbing</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid><orcidid>https://orcid.org/0000-0003-1613-9288</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><orcidid>https://orcid.org/0000-0001-6643-9698</orcidid><orcidid>https://orcid.org/0000-0002-5827-5344</orcidid></search><sort><creationdate>20241101</creationdate><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><author>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Asymmetric convolution</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>feature maps</topic><topic>Fuses</topic><topic>fusing features</topic><topic>Shape</topic><topic>Target tracking</topic><topic>Task analysis</topic><topic>vision tasks</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Wencheng</creatorcontrib><creatorcontrib>Dong, Xingping</creatorcontrib><creatorcontrib>Zhang, Yiyuan</creatorcontrib><creatorcontrib>Crandall, David</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Han, Wencheng</au><au>Dong, Xingping</au><au>Zhang, Yiyuan</au><au>Crandall, David</au><au>Xu, Cheng-Zhong</au><au>Shen, Jianbing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2024-11-01</date><risdate>2024</risdate><volume>46</volume><issue>11</issue><spage>7363</spage><epage>7376</epage><pages>7363-7376</pages><issn>0162-8828</issn><issn>1939-3539</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38743545</pmid><doi>10.1109/TPAMI.2024.3400873</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid><orcidid>https://orcid.org/0000-0003-1613-9288</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><orcidid>https://orcid.org/0000-0001-6643-9698</orcidid><orcidid>https://orcid.org/0000-0002-5827-5344</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2024-11, Vol.46 (11), p.7363-7376
issn 0162-8828
1939-3539
1939-3539
2160-9292
language eng
recordid cdi_proquest_miscellaneous_3055452455
source IEEE Electronic Library (IEL)
subjects Asymmetric convolution
Convolution
Feature extraction
feature maps
Fuses
fusing features
Shape
Target tracking
Task analysis
vision tasks
Visualization
title Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T11%3A35%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Asymmetric%20Convolution:%20An%20Efficient%20and%20Generalized%20Method%20to%20Fuse%20Feature%20Maps%20in%20Multiple%20Vision%20Tasks&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Han,%20Wencheng&rft.date=2024-11-01&rft.volume=46&rft.issue=11&rft.spage=7363&rft.epage=7376&rft.pages=7363-7376&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2024.3400873&rft_dat=%3Cproquest_RIE%3E3055452455%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3055452455&rft_id=info:pmid/38743545&rft_ieee_id=10530458&rfr_iscdi=true