Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks

Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2023-07, Vol.8 (7), p.1-8
Hauptverfasser:	Liang, Mingjian, Hu, Junjie, Bao, Chenyu, Feng, Hua, Deng, Fuqin, Lam, Tin Lun
Format:	Artikel
Sprache:	eng
Schlagworte:	Color imagery Crowd monitoring Data mining Decoding Feature extraction Fuses Image segmentation Multi-modality data fusion Object detection Object recognition Perception RGB-Thermal fusion RGB-thermal perception Salience Semantic segmentation Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	8
container_issue	7
container_start_page	1
container_title	IEEE robotics and automation letters
container_volume	8
creator	Liang, Mingjian Hu, Junjie Bao, Chenyu Feng, Hua Deng, Fuqin Lam, Tin Lun
description	Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .
doi_str_mv	10.1109/LRA.2023.3272269
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2823194287</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10113725</ieee_id><sourcerecordid>2823194287</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</originalsourceid><addsrcrecordid>eNpNkE1rAjEQhkNpoWK999DDQs9rk5lkszlaUVsQWsSeQ8wHrtXdbbJC---7ogdP8w487ww8hDwyOmaMqpflajIGCjhGkACFuiEDQClzlEVxe5XvySilHaWUCZCoxIBMZ7_tvrJVl026ztdd1dT5rN6a2nqXzY-p37PQxGy1eM3XWx8PZp99-mh9e0KztUnf6YHcBbNPfnSZQ_I1n62nb_nyY_E-nSxzCwq6nBvuyg2lVijhmGVSIMXCMeToLTJeSig2IQSnnATOjaMObBBSGNzQwnkckufz3TY2P0efOr1rjrHuX2ooAZniUMqeomfKxial6INuY3Uw8U8zqk-2dG9Ln2zpi62-8nSuVN77K5wxlCDwH93-ZFE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823194287</pqid></control><display><type>article</type><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><source>IEEE Electronic Library (IEL)</source><creator>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</creator><creatorcontrib>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</creatorcontrib><description>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2023.3272269</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Color imagery ; Crowd monitoring ; Data mining ; Decoding ; Feature extraction ; Fuses ; Image segmentation ; Multi-modality data fusion ; Object detection ; Object recognition ; Perception ; RGB-Thermal fusion ; RGB-thermal perception ; Salience ; Semantic segmentation ; Task analysis</subject><ispartof>IEEE robotics and automation letters, 2023-07, Vol.8 (7), p.1-8</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</citedby><cites>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</cites><orcidid>0000-0002-6363-1446 ; 0000-0002-1911-4361 ; 0000-0002-7071-7184 ; 0000-0002-8098-9095</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10113725$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10113725$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liang, Mingjian</creatorcontrib><creatorcontrib>Hu, Junjie</creatorcontrib><creatorcontrib>Bao, Chenyu</creatorcontrib><creatorcontrib>Feng, Hua</creatorcontrib><creatorcontrib>Deng, Fuqin</creatorcontrib><creatorcontrib>Lam, Tin Lun</creatorcontrib><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</description><subject>Color imagery</subject><subject>Crowd monitoring</subject><subject>Data mining</subject><subject>Decoding</subject><subject>Feature extraction</subject><subject>Fuses</subject><subject>Image segmentation</subject><subject>Multi-modality data fusion</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Perception</subject><subject>RGB-Thermal fusion</subject><subject>RGB-thermal perception</subject><subject>Salience</subject><subject>Semantic segmentation</subject><subject>Task analysis</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1rAjEQhkNpoWK999DDQs9rk5lkszlaUVsQWsSeQ8wHrtXdbbJC---7ogdP8w487ww8hDwyOmaMqpflajIGCjhGkACFuiEDQClzlEVxe5XvySilHaWUCZCoxIBMZ7_tvrJVl026ztdd1dT5rN6a2nqXzY-p37PQxGy1eM3XWx8PZp99-mh9e0KztUnf6YHcBbNPfnSZQ_I1n62nb_nyY_E-nSxzCwq6nBvuyg2lVijhmGVSIMXCMeToLTJeSig2IQSnnATOjaMObBBSGNzQwnkckufz3TY2P0efOr1rjrHuX2ooAZniUMqeomfKxial6INuY3Uw8U8zqk-2dG9Ln2zpi62-8nSuVN77K5wxlCDwH93-ZFE</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Liang, Mingjian</creator><creator>Hu, Junjie</creator><creator>Bao, Chenyu</creator><creator>Feng, Hua</creator><creator>Deng, Fuqin</creator><creator>Lam, Tin Lun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6363-1446</orcidid><orcidid>https://orcid.org/0000-0002-1911-4361</orcidid><orcidid>https://orcid.org/0000-0002-7071-7184</orcidid><orcidid>https://orcid.org/0000-0002-8098-9095</orcidid></search><sort><creationdate>20230701</creationdate><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><author>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Color imagery</topic><topic>Crowd monitoring</topic><topic>Data mining</topic><topic>Decoding</topic><topic>Feature extraction</topic><topic>Fuses</topic><topic>Image segmentation</topic><topic>Multi-modality data fusion</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Perception</topic><topic>RGB-Thermal fusion</topic><topic>RGB-thermal perception</topic><topic>Salience</topic><topic>Semantic segmentation</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liang, Mingjian</creatorcontrib><creatorcontrib>Hu, Junjie</creatorcontrib><creatorcontrib>Bao, Chenyu</creatorcontrib><creatorcontrib>Feng, Hua</creatorcontrib><creatorcontrib>Deng, Fuqin</creatorcontrib><creatorcontrib>Lam, Tin Lun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liang, Mingjian</au><au>Hu, Junjie</au><au>Bao, Chenyu</au><au>Feng, Hua</au><au>Deng, Fuqin</au><au>Lam, Tin Lun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2023-07-01</date><risdate>2023</risdate><volume>8</volume><issue>7</issue><spage>1</spage><epage>8</epage><pages>1-8</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2023.3272269</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-6363-1446</orcidid><orcidid>https://orcid.org/0000-0002-1911-4361</orcidid><orcidid>https://orcid.org/0000-0002-7071-7184</orcidid><orcidid>https://orcid.org/0000-0002-8098-9095</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2377-3766
ispartof	IEEE robotics and automation letters, 2023-07, Vol.8 (7), p.1-8
issn	2377-3766 2377-3766
language	eng
recordid	cdi_proquest_journals_2823194287
source	IEEE Electronic Library (IEL)
subjects	Color imagery Crowd monitoring Data mining Decoding Feature extraction Fuses Image segmentation Multi-modality data fusion Object detection Object recognition Perception RGB-Thermal fusion RGB-thermal perception Salience Semantic segmentation Task analysis
title	Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T22%3A04%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Explicit%20Attention-Enhanced%20Fusion%20for%20RGB-Thermal%20Perception%20Tasks&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Liang,%20Mingjian&rft.date=2023-07-01&rft.volume=8&rft.issue=7&rft.spage=1&rft.epage=8&rft.pages=1-8&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2023.3272269&rft_dat=%3Cproquest_RIE%3E2823194287%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823194287&rft_id=info:pmid/&rft_ieee_id=10113725&rfr_iscdi=true