RGB-T Object Detection With Failure Scenarios

Currently, RGB-thermal (RGB-T) object detection algorithms have demonstrated excellent performance, but issues such as modality failure caused by fog, strong light, sensor damage, and other conditions can significantly impact the detector's performance. This paper proposes a multimodal object d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in applied earth observations and remote sensing 2024-12, p.1-12
Hauptverfasser:	Wang, Qingwang, Sun, Yuxuan, Chi, Yongke, Shen, Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Attention mechanisms Data mining Diffusion model Diffusion models Feature extraction Fuses kernel method Lighting multimodal remote sensing Mutual information Object detection Redundancy rgb-thermal images
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	12
container_issue
container_start_page	1
container_title	IEEE journal of selected topics in applied earth observations and remote sensing
container_volume
creator	Wang, Qingwang Sun, Yuxuan Chi, Yongke Shen, Tao
description	Currently, RGB-thermal (RGB-T) object detection algorithms have demonstrated excellent performance, but issues such as modality failure caused by fog, strong light, sensor damage, and other conditions can significantly impact the detector's performance. This paper proposes a multimodal object detection method named diffusion enhanced object detection network (DENet), aiming to address modality failure problems caused by non-routine environments, sensor anomalies, and other factors, while suppressing redundant information in multimodal data to improve model accuracy. Firstly, we design a multidimensional incremental information generation module based on a diffusion model, which reconstructs the unstable information of RGB-T images through the reverse diffusion process using the original fusion feature map. To further address the issue of redundant information in existing RGB-T object detection models, a redundant information suppression module is introduced, minimizing cross-modal redundant information based on mutual information and contrastive loss. Finally, a kernel similarity-aware illumination module (KSIM) is introduced to dynamically adjusts the weighting of RGB and thermal features by incorporating both illumination intensity and the similarity between modalities. KSIM can fine-tunes the contribution of each modality during fusion, ensuring a more precise balance that improves recognition performance across diverse conditions. Experimental results on the DroneVehicle and VEDAI datasets show that DENet performs outstandingly in multimodal object detection tasks, effectively improving detection accuracy and reducing the impact of modality failure on performance.
doi_str_mv	10.1109/JSTARS.2024.3523408
format	Article
fullrecord	<record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10817087</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10817087</ieee_id><sourcerecordid>10_1109_JSTARS_2024_3523408</sourcerecordid><originalsourceid>FETCH-LOGICAL-c657-a6724da5859859993b29de8ff61a5a230db0eb99a1a19d04e2272c0cecdb6ea03</originalsourceid><addsrcrecordid>eNpNj81Kw0AURgdRMFafQBd5gYn3zk8ms6zVVqVQaAIuh8nkBqfUVpK48O1NSRfCB2d1PjiM3SNkiGAf38tqvi0zAUJlUgupoLhgiUCNHLXUlyxBKy1HBeqa3fT9DiAXxsqE8e3qiVfppt5RGNJnGkbE4yH9iMNnuvRx_9NRWgY6-C4e-1t21fp9T3dnzli1fKkWr3y9Wb0t5msecm24z41QjdeFtuOslbWwDRVtm6PXXkhoaqDaWo8ebQOKhDAiQKDQ1Dl5kDMmp9vQHfu-o9Z9d_HLd78OwZ2C3RTsTsHuHDxaD5MVieifUaCBwsg_RbdRGw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RGB-T Object Detection With Failure Scenarios</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wang, Qingwang ; Sun, Yuxuan ; Chi, Yongke ; Shen, Tao</creator><creatorcontrib>Wang, Qingwang ; Sun, Yuxuan ; Chi, Yongke ; Shen, Tao</creatorcontrib><description>Currently, RGB-thermal (RGB-T) object detection algorithms have demonstrated excellent performance, but issues such as modality failure caused by fog, strong light, sensor damage, and other conditions can significantly impact the detector's performance. This paper proposes a multimodal object detection method named diffusion enhanced object detection network (DENet), aiming to address modality failure problems caused by non-routine environments, sensor anomalies, and other factors, while suppressing redundant information in multimodal data to improve model accuracy. Firstly, we design a multidimensional incremental information generation module based on a diffusion model, which reconstructs the unstable information of RGB-T images through the reverse diffusion process using the original fusion feature map. To further address the issue of redundant information in existing RGB-T object detection models, a redundant information suppression module is introduced, minimizing cross-modal redundant information based on mutual information and contrastive loss. Finally, a kernel similarity-aware illumination module (KSIM) is introduced to dynamically adjusts the weighting of RGB and thermal features by incorporating both illumination intensity and the similarity between modalities. KSIM can fine-tunes the contribution of each modality during fusion, ensuring a more precise balance that improves recognition performance across diverse conditions. Experimental results on the DroneVehicle and VEDAI datasets show that DENet performs outstandingly in multimodal object detection tasks, effectively improving detection accuracy and reducing the impact of modality failure on performance.</description><identifier>ISSN: 1939-1404</identifier><identifier>EISSN: 2151-1535</identifier><identifier>DOI: 10.1109/JSTARS.2024.3523408</identifier><identifier>CODEN: IJSTHZ</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Attention mechanisms ; Data mining ; Diffusion model ; Diffusion models ; Feature extraction ; Fuses ; kernel method ; Lighting ; multimodal remote sensing ; Mutual information ; Object detection ; Redundancy ; rgb-thermal images</subject><ispartof>IEEE journal of selected topics in applied earth observations and remote sensing, 2024-12, p.1-12</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-5820-5357 ; 0000-0002-5619-6394 ; 0000-0003-1273-7950</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27924,27925</link.rule.ids></links><search><creatorcontrib>Wang, Qingwang</creatorcontrib><creatorcontrib>Sun, Yuxuan</creatorcontrib><creatorcontrib>Chi, Yongke</creatorcontrib><creatorcontrib>Shen, Tao</creatorcontrib><title>RGB-T Object Detection With Failure Scenarios</title><title>IEEE journal of selected topics in applied earth observations and remote sensing</title><addtitle>JSTARS</addtitle><description>Currently, RGB-thermal (RGB-T) object detection algorithms have demonstrated excellent performance, but issues such as modality failure caused by fog, strong light, sensor damage, and other conditions can significantly impact the detector's performance. This paper proposes a multimodal object detection method named diffusion enhanced object detection network (DENet), aiming to address modality failure problems caused by non-routine environments, sensor anomalies, and other factors, while suppressing redundant information in multimodal data to improve model accuracy. Firstly, we design a multidimensional incremental information generation module based on a diffusion model, which reconstructs the unstable information of RGB-T images through the reverse diffusion process using the original fusion feature map. To further address the issue of redundant information in existing RGB-T object detection models, a redundant information suppression module is introduced, minimizing cross-modal redundant information based on mutual information and contrastive loss. Finally, a kernel similarity-aware illumination module (KSIM) is introduced to dynamically adjusts the weighting of RGB and thermal features by incorporating both illumination intensity and the similarity between modalities. KSIM can fine-tunes the contribution of each modality during fusion, ensuring a more precise balance that improves recognition performance across diverse conditions. Experimental results on the DroneVehicle and VEDAI datasets show that DENet performs outstandingly in multimodal object detection tasks, effectively improving detection accuracy and reducing the impact of modality failure on performance.</description><subject>Accuracy</subject><subject>Attention mechanisms</subject><subject>Data mining</subject><subject>Diffusion model</subject><subject>Diffusion models</subject><subject>Feature extraction</subject><subject>Fuses</subject><subject>kernel method</subject><subject>Lighting</subject><subject>multimodal remote sensing</subject><subject>Mutual information</subject><subject>Object detection</subject><subject>Redundancy</subject><subject>rgb-thermal images</subject><issn>1939-1404</issn><issn>2151-1535</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpNj81Kw0AURgdRMFafQBd5gYn3zk8ms6zVVqVQaAIuh8nkBqfUVpK48O1NSRfCB2d1PjiM3SNkiGAf38tqvi0zAUJlUgupoLhgiUCNHLXUlyxBKy1HBeqa3fT9DiAXxsqE8e3qiVfppt5RGNJnGkbE4yH9iMNnuvRx_9NRWgY6-C4e-1t21fp9T3dnzli1fKkWr3y9Wb0t5msecm24z41QjdeFtuOslbWwDRVtm6PXXkhoaqDaWo8ebQOKhDAiQKDQ1Dl5kDMmp9vQHfu-o9Z9d_HLd78OwZ2C3RTsTsHuHDxaD5MVieifUaCBwsg_RbdRGw</recordid><startdate>20241226</startdate><enddate>20241226</enddate><creator>Wang, Qingwang</creator><creator>Sun, Yuxuan</creator><creator>Chi, Yongke</creator><creator>Shen, Tao</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5820-5357</orcidid><orcidid>https://orcid.org/0000-0002-5619-6394</orcidid><orcidid>https://orcid.org/0000-0003-1273-7950</orcidid></search><sort><creationdate>20241226</creationdate><title>RGB-T Object Detection With Failure Scenarios</title><author>Wang, Qingwang ; Sun, Yuxuan ; Chi, Yongke ; Shen, Tao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c657-a6724da5859859993b29de8ff61a5a230db0eb99a1a19d04e2272c0cecdb6ea03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Attention mechanisms</topic><topic>Data mining</topic><topic>Diffusion model</topic><topic>Diffusion models</topic><topic>Feature extraction</topic><topic>Fuses</topic><topic>kernel method</topic><topic>Lighting</topic><topic>multimodal remote sensing</topic><topic>Mutual information</topic><topic>Object detection</topic><topic>Redundancy</topic><topic>rgb-thermal images</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Qingwang</creatorcontrib><creatorcontrib>Sun, Yuxuan</creatorcontrib><creatorcontrib>Chi, Yongke</creatorcontrib><creatorcontrib>Shen, Tao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE journal of selected topics in applied earth observations and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Qingwang</au><au>Sun, Yuxuan</au><au>Chi, Yongke</au><au>Shen, Tao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RGB-T Object Detection With Failure Scenarios</atitle><jtitle>IEEE journal of selected topics in applied earth observations and remote sensing</jtitle><stitle>JSTARS</stitle><date>2024-12-26</date><risdate>2024</risdate><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>1939-1404</issn><eissn>2151-1535</eissn><coden>IJSTHZ</coden><abstract>Currently, RGB-thermal (RGB-T) object detection algorithms have demonstrated excellent performance, but issues such as modality failure caused by fog, strong light, sensor damage, and other conditions can significantly impact the detector's performance. This paper proposes a multimodal object detection method named diffusion enhanced object detection network (DENet), aiming to address modality failure problems caused by non-routine environments, sensor anomalies, and other factors, while suppressing redundant information in multimodal data to improve model accuracy. Firstly, we design a multidimensional incremental information generation module based on a diffusion model, which reconstructs the unstable information of RGB-T images through the reverse diffusion process using the original fusion feature map. To further address the issue of redundant information in existing RGB-T object detection models, a redundant information suppression module is introduced, minimizing cross-modal redundant information based on mutual information and contrastive loss. Finally, a kernel similarity-aware illumination module (KSIM) is introduced to dynamically adjusts the weighting of RGB and thermal features by incorporating both illumination intensity and the similarity between modalities. KSIM can fine-tunes the contribution of each modality during fusion, ensuring a more precise balance that improves recognition performance across diverse conditions. Experimental results on the DroneVehicle and VEDAI datasets show that DENet performs outstandingly in multimodal object detection tasks, effectively improving detection accuracy and reducing the impact of modality failure on performance.</abstract><pub>IEEE</pub><doi>10.1109/JSTARS.2024.3523408</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-5820-5357</orcidid><orcidid>https://orcid.org/0000-0002-5619-6394</orcidid><orcidid>https://orcid.org/0000-0003-1273-7950</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1939-1404
ispartof	IEEE journal of selected topics in applied earth observations and remote sensing, 2024-12, p.1-12
issn	1939-1404 2151-1535
language	eng
recordid	cdi_ieee_primary_10817087
source	DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Accuracy Attention mechanisms Data mining Diffusion model Diffusion models Feature extraction Fuses kernel method Lighting multimodal remote sensing Mutual information Object detection Redundancy rgb-thermal images
title	RGB-T Object Detection With Failure Scenarios
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T23%3A57%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RGB-T%20Object%20Detection%20With%20Failure%20Scenarios&rft.jtitle=IEEE%20journal%20of%20selected%20topics%20in%20applied%20earth%20observations%20and%20remote%20sensing&rft.au=Wang,%20Qingwang&rft.date=2024-12-26&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=1939-1404&rft.eissn=2151-1535&rft.coden=IJSTHZ&rft_id=info:doi/10.1109/JSTARS.2024.3523408&rft_dat=%3Ccrossref_ieee_%3E10_1109_JSTARS_2024_3523408%3C/crossref_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10817087&rfr_iscdi=true