Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks
Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performa...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2024-11, Vol.46 (11), p.7363-7376 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 7376 |
---|---|
container_issue | 11 |
container_start_page | 7363 |
container_title | IEEE transactions on pattern analysis and machine intelligence |
container_volume | 46 |
creator | Han, Wencheng Dong, Xingping Zhang, Yiyuan Crandall, David Xu, Cheng-Zhong Shen, Jianbing |
description | Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency. |
doi_str_mv | 10.1109/TPAMI.2024.3400873 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_3055452455</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10530458</ieee_id><sourcerecordid>3055452455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQhhdRbK3-ARHZo5fUTWa32XgLpa0Fgx6q17DJTnA1X2Y3Qv31praKp2HgeV9mHkIufTb1fRbdbp7iZD0NWMCnwBmTIRyRsR9B5IGA6JiMmT8LPCkDOSJn1r4x5nPB4JSMQIYcBBdjYmK7rSp0ncnpvKk_m7J3pqnvaFzTRVGY3GDtqKo1XWGNnSrNF2qaoHttNHUNXfYW6RKV6zukiWotNTVN-tKZtkT6YuxQRjfKvttzclKo0uLFYU7I83Kxmd97D4-r9Tx-8PIgFM4rJA8kV5HmIcMIeRgKlDrSADrLZrKQCnIRAgLjWchmUkZa8mHVGWPCBw4TcrPvbbvmo0fr0srYHMtS1dj0NgUmhs8DLsSABns07xprOyzStjOV6rapz9Kd4vRHcbpTnB4UD6HrQ3-fVaj_Ir9OB-BqDxhE_NcohpOFhG9uA3_C</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055452455</pqid></control><display><type>article</type><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><source>IEEE Electronic Library (IEL)</source><creator>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</creator><creatorcontrib>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</creatorcontrib><description>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</description><identifier>ISSN: 0162-8828</identifier><identifier>ISSN: 1939-3539</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2024.3400873</identifier><identifier>PMID: 38743545</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Asymmetric convolution ; Convolution ; Feature extraction ; feature maps ; Fuses ; fusing features ; Shape ; Target tracking ; Task analysis ; vision tasks ; Visualization</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2024-11, Vol.46 (11), p.7363-7376</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</cites><orcidid>0000-0003-2656-3082 ; 0000-0003-1613-9288 ; 0000-0001-9480-0356 ; 0000-0001-6643-9698 ; 0000-0002-5827-5344</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10530458$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10530458$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38743545$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Wencheng</creatorcontrib><creatorcontrib>Dong, Xingping</creatorcontrib><creatorcontrib>Zhang, Yiyuan</creatorcontrib><creatorcontrib>Crandall, David</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</description><subject>Asymmetric convolution</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>feature maps</subject><subject>Fuses</subject><subject>fusing features</subject><subject>Shape</subject><subject>Target tracking</subject><subject>Task analysis</subject><subject>vision tasks</subject><subject>Visualization</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AQhhdRbK3-ARHZo5fUTWa32XgLpa0Fgx6q17DJTnA1X2Y3Qv31praKp2HgeV9mHkIufTb1fRbdbp7iZD0NWMCnwBmTIRyRsR9B5IGA6JiMmT8LPCkDOSJn1r4x5nPB4JSMQIYcBBdjYmK7rSp0ncnpvKk_m7J3pqnvaFzTRVGY3GDtqKo1XWGNnSrNF2qaoHttNHUNXfYW6RKV6zukiWotNTVN-tKZtkT6YuxQRjfKvttzclKo0uLFYU7I83Kxmd97D4-r9Tx-8PIgFM4rJA8kV5HmIcMIeRgKlDrSADrLZrKQCnIRAgLjWchmUkZa8mHVGWPCBw4TcrPvbbvmo0fr0srYHMtS1dj0NgUmhs8DLsSABns07xprOyzStjOV6rapz9Kd4vRHcbpTnB4UD6HrQ3-fVaj_Ir9OB-BqDxhE_NcohpOFhG9uA3_C</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Han, Wencheng</creator><creator>Dong, Xingping</creator><creator>Zhang, Yiyuan</creator><creator>Crandall, David</creator><creator>Xu, Cheng-Zhong</creator><creator>Shen, Jianbing</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid><orcidid>https://orcid.org/0000-0003-1613-9288</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><orcidid>https://orcid.org/0000-0001-6643-9698</orcidid><orcidid>https://orcid.org/0000-0002-5827-5344</orcidid></search><sort><creationdate>20241101</creationdate><title>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</title><author>Han, Wencheng ; Dong, Xingping ; Zhang, Yiyuan ; Crandall, David ; Xu, Cheng-Zhong ; Shen, Jianbing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-f84284a9d470e9e4775e8d9d33dbb68f8a3c573e304b706889d843e3db0051343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Asymmetric convolution</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>feature maps</topic><topic>Fuses</topic><topic>fusing features</topic><topic>Shape</topic><topic>Target tracking</topic><topic>Task analysis</topic><topic>vision tasks</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Wencheng</creatorcontrib><creatorcontrib>Dong, Xingping</creatorcontrib><creatorcontrib>Zhang, Yiyuan</creatorcontrib><creatorcontrib>Crandall, David</creatorcontrib><creatorcontrib>Xu, Cheng-Zhong</creatorcontrib><creatorcontrib>Shen, Jianbing</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Han, Wencheng</au><au>Dong, Xingping</au><au>Zhang, Yiyuan</au><au>Crandall, David</au><au>Xu, Cheng-Zhong</au><au>Shen, Jianbing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2024-11-01</date><risdate>2024</risdate><volume>46</volume><issue>11</issue><spage>7363</spage><epage>7376</epage><pages>7363-7376</pages><issn>0162-8828</issn><issn>1939-3539</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and time-consuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38743545</pmid><doi>10.1109/TPAMI.2024.3400873</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2656-3082</orcidid><orcidid>https://orcid.org/0000-0003-1613-9288</orcidid><orcidid>https://orcid.org/0000-0001-9480-0356</orcidid><orcidid>https://orcid.org/0000-0001-6643-9698</orcidid><orcidid>https://orcid.org/0000-0002-5827-5344</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0162-8828 |
ispartof | IEEE transactions on pattern analysis and machine intelligence, 2024-11, Vol.46 (11), p.7363-7376 |
issn | 0162-8828 1939-3539 1939-3539 2160-9292 |
language | eng |
recordid | cdi_proquest_miscellaneous_3055452455 |
source | IEEE Electronic Library (IEL) |
subjects | Asymmetric convolution Convolution Feature extraction feature maps Fuses fusing features Shape Target tracking Task analysis vision tasks Visualization |
title | Asymmetric Convolution: An Efficient and Generalized Method to Fuse Feature Maps in Multiple Vision Tasks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T11%3A35%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Asymmetric%20Convolution:%20An%20Efficient%20and%20Generalized%20Method%20to%20Fuse%20Feature%20Maps%20in%20Multiple%20Vision%20Tasks&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Han,%20Wencheng&rft.date=2024-11-01&rft.volume=46&rft.issue=11&rft.spage=7363&rft.epage=7376&rft.pages=7363-7376&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2024.3400873&rft_dat=%3Cproquest_RIE%3E3055452455%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3055452455&rft_id=info:pmid/38743545&rft_ieee_id=10530458&rfr_iscdi=true |