Rethinking Self-Attention for Multispectral Object Detection
Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings. Thus, multi-modal object detection has widespread applications...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on intelligent transportation systems 2024-11, Vol.25 (11), p.16300-16311 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 16311 |
---|---|
container_issue | 11 |
container_start_page | 16300 |
container_title | IEEE transactions on intelligent transportation systems |
container_volume | 25 |
creator | Hu, Sijie Bonardi, Fabien Bouchafa, Samia Prendinger, Helmut Sidibe, Desire |
description | Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings. Thus, multi-modal object detection has widespread applications, particularly in challenging weather conditions like low-light scenarios. The core of multi-modal fusion lies in developing a reasonable fusion strategy, which can fully exploit the complementary features of different modalities while preventing a significant increase in model complexity. To this end, this paper proposes a novel lightweight cross-fusion module named Channel-Patch Cross Fusion (CPCF), which leverages Channel-wise Cross-Attention (CCA), Patch-wise Cross-Attention (PCA) and Adaptive Gating (AG) to encourage mutual rectification among different modalities. This process simultaneously explores commonalities across modalities while maintaining the uniqueness of each modality. Furthermore, we design a versatile intermediate fusion framework that can leverage CPCF to enhance the performance of multi-modal object detection. The proposed method is extensively evaluated on multiple public multi-modal datasets, namely FLIR, LLVIP, and DroneVehicle. The experiments indicate that our method yields consistent performance gains across various benchmarks and can be extended to different types of detectors, further demonstrating its robustness and generalizability. Our codes are available at https://github.com/Superjie13/CPCF_Multispectral . |
doi_str_mv | 10.1109/TITS.2024.3412417 |
format | Article |
fullrecord | <record><control><sourceid>hal_RIE</sourceid><recordid>TN_cdi_ieee_primary_10565297</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10565297</ieee_id><sourcerecordid>oai_HAL_hal_04620359v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-2ed3bc15f8952db4495b841829048480e9b2523269ca99c2bc28566e27961de73</originalsourceid><addsrcrecordid>eNpNkE1Lw0AURQdRsFZ_gOAiWxep815mJjPgptSPFiIFW9dDMn2xU2NSklHw35vQIq7e5XLuWxzGroFPALi5Wy_WqwlyFJNEAApIT9gIpNQx56BOh4wiNlzyc3bRdbu-FRJgxO5fKWx9_eHr92hFVRlPQ6A6-KaOyqaNXr6q4Ls9udDmVbQsdn2KHij0p0cu2VmZVx1dHe-YvT09rmfzOFs-L2bTLHZoZIiRNknhQJbaSNwUQhhZaAEaDRdaaE6mQIkJKuNyYxwWDrVUijA1CjaUJmN2e_i7zSu7b_1n3v7YJvd2Ps3s0HGhkCfSfEPPwoF1bdN1LZV_A-B2UGUHVXZQZY-q-s3NYeOJ6B8vlUSTJr8FimNi</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Rethinking Self-Attention for Multispectral Object Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Hu, Sijie ; Bonardi, Fabien ; Bouchafa, Samia ; Prendinger, Helmut ; Sidibe, Desire</creator><creatorcontrib>Hu, Sijie ; Bonardi, Fabien ; Bouchafa, Samia ; Prendinger, Helmut ; Sidibe, Desire</creatorcontrib><description>Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings. Thus, multi-modal object detection has widespread applications, particularly in challenging weather conditions like low-light scenarios. The core of multi-modal fusion lies in developing a reasonable fusion strategy, which can fully exploit the complementary features of different modalities while preventing a significant increase in model complexity. To this end, this paper proposes a novel lightweight cross-fusion module named Channel-Patch Cross Fusion (CPCF), which leverages Channel-wise Cross-Attention (CCA), Patch-wise Cross-Attention (PCA) and Adaptive Gating (AG) to encourage mutual rectification among different modalities. This process simultaneously explores commonalities across modalities while maintaining the uniqueness of each modality. Furthermore, we design a versatile intermediate fusion framework that can leverage CPCF to enhance the performance of multi-modal object detection. The proposed method is extensively evaluated on multiple public multi-modal datasets, namely FLIR, LLVIP, and DroneVehicle. The experiments indicate that our method yields consistent performance gains across various benchmarks and can be extended to different types of detectors, further demonstrating its robustness and generalizability. Our codes are available at https://github.com/Superjie13/CPCF_Multispectral .</description><identifier>ISSN: 1524-9050</identifier><identifier>EISSN: 1558-0016</identifier><identifier>DOI: 10.1109/TITS.2024.3412417</identifier><identifier>CODEN: ITISFG</identifier><language>eng</language><publisher>IEEE</publisher><subject>attention ; Complexity theory ; Computer Science ; Deep learning ; Feature extraction ; Infrared imaging ; intermediate fusion ; Multispectral ; Multispectral imaging ; Object detection ; Robustness ; YOLO</subject><ispartof>IEEE transactions on intelligent transportation systems, 2024-11, Vol.25 (11), p.16300-16311</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c295t-2ed3bc15f8952db4495b841829048480e9b2523269ca99c2bc28566e27961de73</cites><orcidid>0000-0002-5843-7139 ; 0000-0002-8518-2856 ; 0000-0003-4654-9835 ; 0000-0002-3555-7306 ; 0000-0002-2860-8128</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10565297$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,776,780,792,881,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10565297$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://hal.science/hal-04620359$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Hu, Sijie</creatorcontrib><creatorcontrib>Bonardi, Fabien</creatorcontrib><creatorcontrib>Bouchafa, Samia</creatorcontrib><creatorcontrib>Prendinger, Helmut</creatorcontrib><creatorcontrib>Sidibe, Desire</creatorcontrib><title>Rethinking Self-Attention for Multispectral Object Detection</title><title>IEEE transactions on intelligent transportation systems</title><addtitle>TITS</addtitle><description>Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings. Thus, multi-modal object detection has widespread applications, particularly in challenging weather conditions like low-light scenarios. The core of multi-modal fusion lies in developing a reasonable fusion strategy, which can fully exploit the complementary features of different modalities while preventing a significant increase in model complexity. To this end, this paper proposes a novel lightweight cross-fusion module named Channel-Patch Cross Fusion (CPCF), which leverages Channel-wise Cross-Attention (CCA), Patch-wise Cross-Attention (PCA) and Adaptive Gating (AG) to encourage mutual rectification among different modalities. This process simultaneously explores commonalities across modalities while maintaining the uniqueness of each modality. Furthermore, we design a versatile intermediate fusion framework that can leverage CPCF to enhance the performance of multi-modal object detection. The proposed method is extensively evaluated on multiple public multi-modal datasets, namely FLIR, LLVIP, and DroneVehicle. The experiments indicate that our method yields consistent performance gains across various benchmarks and can be extended to different types of detectors, further demonstrating its robustness and generalizability. Our codes are available at https://github.com/Superjie13/CPCF_Multispectral .</description><subject>attention</subject><subject>Complexity theory</subject><subject>Computer Science</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Infrared imaging</subject><subject>intermediate fusion</subject><subject>Multispectral</subject><subject>Multispectral imaging</subject><subject>Object detection</subject><subject>Robustness</subject><subject>YOLO</subject><issn>1524-9050</issn><issn>1558-0016</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1Lw0AURQdRsFZ_gOAiWxep815mJjPgptSPFiIFW9dDMn2xU2NSklHw35vQIq7e5XLuWxzGroFPALi5Wy_WqwlyFJNEAApIT9gIpNQx56BOh4wiNlzyc3bRdbu-FRJgxO5fKWx9_eHr92hFVRlPQ6A6-KaOyqaNXr6q4Ls9udDmVbQsdn2KHij0p0cu2VmZVx1dHe-YvT09rmfzOFs-L2bTLHZoZIiRNknhQJbaSNwUQhhZaAEaDRdaaE6mQIkJKuNyYxwWDrVUijA1CjaUJmN2e_i7zSu7b_1n3v7YJvd2Ps3s0HGhkCfSfEPPwoF1bdN1LZV_A-B2UGUHVXZQZY-q-s3NYeOJ6B8vlUSTJr8FimNi</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Hu, Sijie</creator><creator>Bonardi, Fabien</creator><creator>Bouchafa, Samia</creator><creator>Prendinger, Helmut</creator><creator>Sidibe, Desire</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-5843-7139</orcidid><orcidid>https://orcid.org/0000-0002-8518-2856</orcidid><orcidid>https://orcid.org/0000-0003-4654-9835</orcidid><orcidid>https://orcid.org/0000-0002-3555-7306</orcidid><orcidid>https://orcid.org/0000-0002-2860-8128</orcidid></search><sort><creationdate>20241101</creationdate><title>Rethinking Self-Attention for Multispectral Object Detection</title><author>Hu, Sijie ; Bonardi, Fabien ; Bouchafa, Samia ; Prendinger, Helmut ; Sidibe, Desire</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-2ed3bc15f8952db4495b841829048480e9b2523269ca99c2bc28566e27961de73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>attention</topic><topic>Complexity theory</topic><topic>Computer Science</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Infrared imaging</topic><topic>intermediate fusion</topic><topic>Multispectral</topic><topic>Multispectral imaging</topic><topic>Object detection</topic><topic>Robustness</topic><topic>YOLO</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Sijie</creatorcontrib><creatorcontrib>Bonardi, Fabien</creatorcontrib><creatorcontrib>Bouchafa, Samia</creatorcontrib><creatorcontrib>Prendinger, Helmut</creatorcontrib><creatorcontrib>Sidibe, Desire</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>IEEE transactions on intelligent transportation systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hu, Sijie</au><au>Bonardi, Fabien</au><au>Bouchafa, Samia</au><au>Prendinger, Helmut</au><au>Sidibe, Desire</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Rethinking Self-Attention for Multispectral Object Detection</atitle><jtitle>IEEE transactions on intelligent transportation systems</jtitle><stitle>TITS</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>25</volume><issue>11</issue><spage>16300</spage><epage>16311</epage><pages>16300-16311</pages><issn>1524-9050</issn><eissn>1558-0016</eissn><coden>ITISFG</coden><abstract>Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings. Thus, multi-modal object detection has widespread applications, particularly in challenging weather conditions like low-light scenarios. The core of multi-modal fusion lies in developing a reasonable fusion strategy, which can fully exploit the complementary features of different modalities while preventing a significant increase in model complexity. To this end, this paper proposes a novel lightweight cross-fusion module named Channel-Patch Cross Fusion (CPCF), which leverages Channel-wise Cross-Attention (CCA), Patch-wise Cross-Attention (PCA) and Adaptive Gating (AG) to encourage mutual rectification among different modalities. This process simultaneously explores commonalities across modalities while maintaining the uniqueness of each modality. Furthermore, we design a versatile intermediate fusion framework that can leverage CPCF to enhance the performance of multi-modal object detection. The proposed method is extensively evaluated on multiple public multi-modal datasets, namely FLIR, LLVIP, and DroneVehicle. The experiments indicate that our method yields consistent performance gains across various benchmarks and can be extended to different types of detectors, further demonstrating its robustness and generalizability. Our codes are available at https://github.com/Superjie13/CPCF_Multispectral .</abstract><pub>IEEE</pub><doi>10.1109/TITS.2024.3412417</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5843-7139</orcidid><orcidid>https://orcid.org/0000-0002-8518-2856</orcidid><orcidid>https://orcid.org/0000-0003-4654-9835</orcidid><orcidid>https://orcid.org/0000-0002-3555-7306</orcidid><orcidid>https://orcid.org/0000-0002-2860-8128</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1524-9050 |
ispartof | IEEE transactions on intelligent transportation systems, 2024-11, Vol.25 (11), p.16300-16311 |
issn | 1524-9050 1558-0016 |
language | eng |
recordid | cdi_ieee_primary_10565297 |
source | IEEE Electronic Library (IEL) |
subjects | attention Complexity theory Computer Science Deep learning Feature extraction Infrared imaging intermediate fusion Multispectral Multispectral imaging Object detection Robustness YOLO |
title | Rethinking Self-Attention for Multispectral Object Detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T08%3A13%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Rethinking%20Self-Attention%20for%20Multispectral%20Object%20Detection&rft.jtitle=IEEE%20transactions%20on%20intelligent%20transportation%20systems&rft.au=Hu,%20Sijie&rft.date=2024-11-01&rft.volume=25&rft.issue=11&rft.spage=16300&rft.epage=16311&rft.pages=16300-16311&rft.issn=1524-9050&rft.eissn=1558-0016&rft.coden=ITISFG&rft_id=info:doi/10.1109/TITS.2024.3412417&rft_dat=%3Chal_RIE%3Eoai_HAL_hal_04620359v1%3C/hal_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10565297&rfr_iscdi=true |