Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks

Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE robotics and automation letters 2023-07, Vol.8 (7), p.1-8
Hauptverfasser: Liang, Mingjian, Hu, Junjie, Bao, Chenyu, Feng, Hua, Deng, Fuqin, Lam, Tin Lun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 8
container_issue 7
container_start_page 1
container_title IEEE robotics and automation letters
container_volume 8
creator Liang, Mingjian
Hu, Junjie
Bao, Chenyu
Feng, Hua
Deng, Fuqin
Lam, Tin Lun
description Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .
doi_str_mv 10.1109/LRA.2023.3272269
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2823194287</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10113725</ieee_id><sourcerecordid>2823194287</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</originalsourceid><addsrcrecordid>eNpNkE1rAjEQhkNpoWK999DDQs9rk5lkszlaUVsQWsSeQ8wHrtXdbbJC---7ogdP8w487ww8hDwyOmaMqpflajIGCjhGkACFuiEDQClzlEVxe5XvySilHaWUCZCoxIBMZ7_tvrJVl026ztdd1dT5rN6a2nqXzY-p37PQxGy1eM3XWx8PZp99-mh9e0KztUnf6YHcBbNPfnSZQ_I1n62nb_nyY_E-nSxzCwq6nBvuyg2lVijhmGVSIMXCMeToLTJeSig2IQSnnATOjaMObBBSGNzQwnkckufz3TY2P0efOr1rjrHuX2ooAZniUMqeomfKxial6INuY3Uw8U8zqk-2dG9Ln2zpi62-8nSuVN77K5wxlCDwH93-ZFE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823194287</pqid></control><display><type>article</type><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><source>IEEE Electronic Library (IEL)</source><creator>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</creator><creatorcontrib>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</creatorcontrib><description>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2023.3272269</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Color imagery ; Crowd monitoring ; Data mining ; Decoding ; Feature extraction ; Fuses ; Image segmentation ; Multi-modality data fusion ; Object detection ; Object recognition ; Perception ; RGB-Thermal fusion ; RGB-thermal perception ; Salience ; Semantic segmentation ; Task analysis</subject><ispartof>IEEE robotics and automation letters, 2023-07, Vol.8 (7), p.1-8</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</citedby><cites>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</cites><orcidid>0000-0002-6363-1446 ; 0000-0002-1911-4361 ; 0000-0002-7071-7184 ; 0000-0002-8098-9095</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10113725$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10113725$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liang, Mingjian</creatorcontrib><creatorcontrib>Hu, Junjie</creatorcontrib><creatorcontrib>Bao, Chenyu</creatorcontrib><creatorcontrib>Feng, Hua</creatorcontrib><creatorcontrib>Deng, Fuqin</creatorcontrib><creatorcontrib>Lam, Tin Lun</creatorcontrib><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</description><subject>Color imagery</subject><subject>Crowd monitoring</subject><subject>Data mining</subject><subject>Decoding</subject><subject>Feature extraction</subject><subject>Fuses</subject><subject>Image segmentation</subject><subject>Multi-modality data fusion</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Perception</subject><subject>RGB-Thermal fusion</subject><subject>RGB-thermal perception</subject><subject>Salience</subject><subject>Semantic segmentation</subject><subject>Task analysis</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1rAjEQhkNpoWK999DDQs9rk5lkszlaUVsQWsSeQ8wHrtXdbbJC---7ogdP8w487ww8hDwyOmaMqpflajIGCjhGkACFuiEDQClzlEVxe5XvySilHaWUCZCoxIBMZ7_tvrJVl026ztdd1dT5rN6a2nqXzY-p37PQxGy1eM3XWx8PZp99-mh9e0KztUnf6YHcBbNPfnSZQ_I1n62nb_nyY_E-nSxzCwq6nBvuyg2lVijhmGVSIMXCMeToLTJeSig2IQSnnATOjaMObBBSGNzQwnkckufz3TY2P0efOr1rjrHuX2ooAZniUMqeomfKxial6INuY3Uw8U8zqk-2dG9Ln2zpi62-8nSuVN77K5wxlCDwH93-ZFE</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Liang, Mingjian</creator><creator>Hu, Junjie</creator><creator>Bao, Chenyu</creator><creator>Feng, Hua</creator><creator>Deng, Fuqin</creator><creator>Lam, Tin Lun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6363-1446</orcidid><orcidid>https://orcid.org/0000-0002-1911-4361</orcidid><orcidid>https://orcid.org/0000-0002-7071-7184</orcidid><orcidid>https://orcid.org/0000-0002-8098-9095</orcidid></search><sort><creationdate>20230701</creationdate><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><author>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Color imagery</topic><topic>Crowd monitoring</topic><topic>Data mining</topic><topic>Decoding</topic><topic>Feature extraction</topic><topic>Fuses</topic><topic>Image segmentation</topic><topic>Multi-modality data fusion</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Perception</topic><topic>RGB-Thermal fusion</topic><topic>RGB-thermal perception</topic><topic>Salience</topic><topic>Semantic segmentation</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liang, Mingjian</creatorcontrib><creatorcontrib>Hu, Junjie</creatorcontrib><creatorcontrib>Bao, Chenyu</creatorcontrib><creatorcontrib>Feng, Hua</creatorcontrib><creatorcontrib>Deng, Fuqin</creatorcontrib><creatorcontrib>Lam, Tin Lun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liang, Mingjian</au><au>Hu, Junjie</au><au>Bao, Chenyu</au><au>Feng, Hua</au><au>Deng, Fuqin</au><au>Lam, Tin Lun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2023-07-01</date><risdate>2023</risdate><volume>8</volume><issue>7</issue><spage>1</spage><epage>8</epage><pages>1-8</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2023.3272269</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-6363-1446</orcidid><orcidid>https://orcid.org/0000-0002-1911-4361</orcidid><orcidid>https://orcid.org/0000-0002-7071-7184</orcidid><orcidid>https://orcid.org/0000-0002-8098-9095</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2377-3766
ispartof IEEE robotics and automation letters, 2023-07, Vol.8 (7), p.1-8
issn 2377-3766
2377-3766
language eng
recordid cdi_proquest_journals_2823194287
source IEEE Electronic Library (IEL)
subjects Color imagery
Crowd monitoring
Data mining
Decoding
Feature extraction
Fuses
Image segmentation
Multi-modality data fusion
Object detection
Object recognition
Perception
RGB-Thermal fusion
RGB-thermal perception
Salience
Semantic segmentation
Task analysis
title Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T22%3A04%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Explicit%20Attention-Enhanced%20Fusion%20for%20RGB-Thermal%20Perception%20Tasks&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Liang,%20Mingjian&rft.date=2023-07-01&rft.volume=8&rft.issue=7&rft.spage=1&rft.epage=8&rft.pages=1-8&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2023.3272269&rft_dat=%3Cproquest_RIE%3E2823194287%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823194287&rft_id=info:pmid/&rft_ieee_id=10113725&rfr_iscdi=true