Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks

Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2024-11, Vol.46 (11), p.7363-7376
Hauptverfasser:	Han, Wencheng, Dong, Xingping, Zhang, Yiyuan, Crandall, David, Xu, Cheng-Zhong, Shen, Jianbing
Format:	Artikel
Sprache:	eng
Schlagworte:	Asymmetric convolution Convolution Feature extraction feature maps Fuses fusing features Shape Target tracking Task analysis vision tasks Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	7376
container_issue	11
container_start_page	7363
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	46
creator	Han, Wencheng Dong, Xingping Zhang, Yiyuan Crandall, David Xu, Cheng-Zhong Shen, Jianbing
description	Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.
doi_str_mv	10.1109/TPAMI.2024.3400873
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_3055452455</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10530458</ieee_id><sourcerecordid>3055452455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQhhdRbK3-ARHZo5fUTWa32XgLpa0Fgx6q17DJTnA1X2Y3Qv31praKp2HgeV9mHkIufTb1fRbdbp7iZD0NWMCnwBmTIRyRsR9B5IGA6JiMmT8LPCkDOSJn1r4x5nPB4JSMQIYcBBdjYmK7rSp0ncnpvKk_m7J3pqnvaFzTRVGY3GDtqKo1XWGNnSrNF2qaoHttNHUNXfYW6RKV6zukiWotNTVN-tKZtkT6YuxQRjfKvttzclKo0uLFYU7I83Kxmd97D4-r9Tx-8PIgFM4rJA8kV5HmIcMIeRgKlDrSADrLZrKQCnIRAgLjWchmUkZa8mHVGWPCBw4TcrPvbbvmo0fr0srYHMtS1dj0NgUmhs8DLsSABns07xprOyzStjOV6rapz9Kd4vRHcbpTnB4UD6HrQ3-fVaj_Ir9OB-BqDxhE_NcohpOFhG9uA3_C</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055452455</pqid></control><display><type>article</type><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><source>IEEE Electronic Library (IEL)</source><creator>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</creator><creatorcontrib>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</creatorcontrib><description>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</description><identifier>ISSN: 0162-8828</identifier><identifier>ISSN: 1939-3539</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2024.3400873</identifier><identifier>PMID: 38743545</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Asymmetric convolution ; Convolution ; Feature extraction ; feature maps ; Fuses ; fusing features ; Shape ; Target tracking ; Task analysis ; vision tasks ; Visualization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2024-11, Vol.46 (11), p.7363-7376</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</cites><orcidid>0000-0003-2656-3082 ; 0000-0003-1613-9288 ; 0000-0001-9480-0356 ; 0000-0001-6643-9698 ; 0000-0002-5827-5344</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10530458$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10530458$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38743545$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Wencheng</creatorcontrib><creatorcontrib>Dong, Xingping</creatorcontrib><creatorcontrib>Zhang, Yiyuan</creatorcontrib><creatorcontrib>Crandall, David</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</description><subject>Asymmetric convolution</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>feature maps</subject><subject>Fuses</subject><subject>fusing features</subject><subject>Shape</subject><subject>Target tracking</subject><subject>Task analysis</subject><subject>vision tasks</subject><subject>Visualization</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AQhhdRbK3-ARHZo5fUTWa32XgLpa0Fgx6q17DJTnA1X2Y3Qv31praKp2HgeV9mHkIufTb1fRbdbp7iZD0NWMCnwBmTIRyRsR9B5IGA6JiMmT8LPCkDOSJn1r4x5nPB4JSMQIYcBBdjYmK7rSp0ncnpvKk_m7J3pqnvaFzTRVGY3GDtqKo1XWGNnSrNF2qaoHttNHUNXfYW6RKV6zukiWotNTVN-tKZtkT6YuxQRjfKvttzclKo0uLFYU7I83Kxmd97D4-r9Tx-8PIgFM4rJA8kV5HmIcMIeRgKlDrSADrLZrKQCnIRAgLjWchmUkZa8mHVGWPCBw4TcrPvbbvmo0fr0srYHMtS1dj0NgUmhs8DLsSABns07xprOyzStjOV6rapz9Kd4vRHcbpTnB4UD6HrQ3-fVaj_Ir9OB-BqDxhE_NcohpOFhG9uA3_C</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Han, Wencheng</creator><creator>Dong, Xingping</creator><creator>Zhang, Yiyuan</creator><creator>Crandall, David</creator><creator>Xu, Cheng-Zhong</creator><creator>Shen, Jianbing</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid><orcidid>https://orcid.org/0000-0003-1613-9288</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><orcidid>https://orcid.org/0000-0001-6643-9698</orcidid><orcidid>https://orcid.org/0000-0002-5827-5344</orcidid></search><sort><creationdate>20241101</creationdate><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><author>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Asymmetric convolution</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>feature maps</topic><topic>Fuses</topic><topic>fusing features</topic><topic>Shape</topic><topic>Target tracking</topic><topic>Task analysis</topic><topic>vision tasks</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Wencheng</creatorcontrib><creatorcontrib>Dong, Xingping</creatorcontrib><creatorcontrib>Zhang, Yiyuan</creatorcontrib><creatorcontrib>Crandall, David</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Han, Wencheng</au><au>Dong, Xingping</au><au>Zhang, Yiyuan</au><au>Crandall, David</au><au>Xu, Cheng-Zhong</au><au>Shen, Jianbing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2024-11-01</date><risdate>2024</risdate><volume>46</volume><issue>11</issue><spage>7363</spage><epage>7376</epage><pages>7363-7376</pages><issn>0162-8828</issn><issn>1939-3539</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38743545</pmid><doi>10.1109/TPAMI.2024.3400873</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid><orcidid>https://orcid.org/0000-0003-1613-9288</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><orcidid>https://orcid.org/0000-0001-6643-9698</orcidid><orcidid>https://orcid.org/0000-0002-5827-5344</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2024-11, Vol.46 (11), p.7363-7376
issn	0162-8828 1939-3539 1939-3539 2160-9292
language	eng
recordid	cdi_proquest_miscellaneous_3055452455
source	IEEE Electronic Library (IEL)
subjects	Asymmetric convolution Convolution Feature extraction feature maps Fuses fusing features Shape Target tracking Task analysis vision tasks Visualization
title	Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T11%3A35%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Asymmetric%20Convolution:%20An%20Efficient%20and%20Generalized%20Method%20to%20Fuse%20Feature%20Maps%20in%20Multiple%20Vision%20Tasks&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Han,%20Wencheng&rft.date=2024-11-01&rft.volume=46&rft.issue=11&rft.spage=7363&rft.epage=7376&rft.pages=7363-7376&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2024.3400873&rft_dat=%3Cproquest_RIE%3E3055452455%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3055452455&rft_id=info:pmid/38743545&rft_ieee_id=10530458&rfr_iscdi=true