$(\mathbf{C}^2\)Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection$

(\mathbf{C}^2\)Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection

Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applica...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Yuan, Maoxun, Wei, Xingxing
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive sampling Calibration Computer networks Detectors Feature maps Infrared imagery Infrared imaging Modules Object recognition Robustness (mathematics) Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Yuan, Maoxun Wei, Xingxing
description	Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2831115150</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2831115150</sourcerecordid><originalsourceid>FETCH-proquest_journals_28311151503</originalsourceid><addsrcrecordid>eNqNykELgjAYgOERBEn1HwZd6iC4LUs6ZlmdgvAoyWdOUnSzb_MQ0X9Poh_Q6Tm874A4XAjmBkvOR2RqTOV5Hl-tue8Lh8A8acDes-IVvq88WUQaG4kbGkJdZghW5hRUTkPdtLVspLKATxojKFN8T9pDL4ete1IFAvb7OavkzdKdtD2lVhMyLKA2cvpzTGbRPg6Pbov60Ulj00p3qPqU8kAwxnzme-K_6wPXCkSI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2831115150</pqid></control><display><type>article</type><title>(\mathbf{C}^2\)Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection</title><source>Free E- Journals</source><creator>Yuan, Maoxun ; Wei, Xingxing</creator><creatorcontrib>Yuan, Maoxun ; Wei, Xingxing</creatorcontrib><description>Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Adaptive sampling ; Calibration ; Computer networks ; Detectors ; Feature maps ; Infrared imagery ; Infrared imaging ; Modules ; Object recognition ; Robustness (mathematics) ; Transformers</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Yuan, Maoxun</creatorcontrib><creatorcontrib>Wei, Xingxing</creatorcontrib><title>(\mathbf{C}^2\)Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection</title><title>arXiv.org</title><description>Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git.</description><subject>Adaptive sampling</subject><subject>Calibration</subject><subject>Computer networks</subject><subject>Detectors</subject><subject>Feature maps</subject><subject>Infrared imagery</subject><subject>Infrared imaging</subject><subject>Modules</subject><subject>Object recognition</subject><subject>Robustness (mathematics)</subject><subject>Transformers</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNykELgjAYgOERBEn1HwZd6iC4LUs6ZlmdgvAoyWdOUnSzb_MQ0X9Poh_Q6Tm874A4XAjmBkvOR2RqTOV5Hl-tue8Lh8A8acDes-IVvq88WUQaG4kbGkJdZghW5hRUTkPdtLVspLKATxojKFN8T9pDL4ete1IFAvb7OavkzdKdtD2lVhMyLKA2cvpzTGbRPg6Pbov60Ulj00p3qPqU8kAwxnzme-K_6wPXCkSI</recordid><startdate>20240313</startdate><enddate>20240313</enddate><creator>Yuan, Maoxun</creator><creator>Wei, Xingxing</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240313</creationdate><title>(\mathbf{C}^2\)Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection</title><author>Yuan, Maoxun ; Wei, Xingxing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28311151503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptive sampling</topic><topic>Calibration</topic><topic>Computer networks</topic><topic>Detectors</topic><topic>Feature maps</topic><topic>Infrared imagery</topic><topic>Infrared imaging</topic><topic>Modules</topic><topic>Object recognition</topic><topic>Robustness (mathematics)</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Yuan, Maoxun</creatorcontrib><creatorcontrib>Wei, Xingxing</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yuan, Maoxun</au><au>Wei, Xingxing</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>(\mathbf{C}^2\)Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection</atitle><jtitle>arXiv.org</jtitle><date>2024-03-13</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2831115150
source	Free E- Journals
subjects	Adaptive sampling Calibration Computer networks Detectors Feature maps Infrared imagery Infrared imaging Modules Object recognition Robustness (mathematics) Transformers
title	(\mathbf{C}^2\)Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T19%3A25%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=(%5Cmathbf%7BC%7D%5E2%5C)Former:%20Calibrated%20and%20Complementary%20Transformer%20for%20RGB-Infrared%20Object%20Detection&rft.jtitle=arXiv.org&rft.au=Yuan,%20Maoxun&rft.date=2024-03-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2831115150%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2831115150&rft_id=info:pmid/&rfr_iscdi=true