Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks
Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works...
Gespeichert in:
Veröffentlicht in: | IEEE robotics and automation letters 2023-07, Vol.8 (7), p.1-8 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 8 |
---|---|
container_issue | 7 |
container_start_page | 1 |
container_title | IEEE robotics and automation letters |
container_volume | 8 |
creator | Liang, Mingjian Hu, Junjie Bao, Chenyu Feng, Hua Deng, Fuqin Lam, Tin Lun |
description | Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet . |
doi_str_mv | 10.1109/LRA.2023.3272269 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2823194287</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10113725</ieee_id><sourcerecordid>2823194287</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</originalsourceid><addsrcrecordid>eNpNkE1rAjEQhkNpoWK999DDQs9rk5lkszlaUVsQWsSeQ8wHrtXdbbJC---7ogdP8w487ww8hDwyOmaMqpflajIGCjhGkACFuiEDQClzlEVxe5XvySilHaWUCZCoxIBMZ7_tvrJVl026ztdd1dT5rN6a2nqXzY-p37PQxGy1eM3XWx8PZp99-mh9e0KztUnf6YHcBbNPfnSZQ_I1n62nb_nyY_E-nSxzCwq6nBvuyg2lVijhmGVSIMXCMeToLTJeSig2IQSnnATOjaMObBBSGNzQwnkckufz3TY2P0efOr1rjrHuX2ooAZniUMqeomfKxial6INuY3Uw8U8zqk-2dG9Ln2zpi62-8nSuVN77K5wxlCDwH93-ZFE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823194287</pqid></control><display><type>article</type><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><source>IEEE Electronic Library (IEL)</source><creator>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</creator><creatorcontrib>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</creatorcontrib><description>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2023.3272269</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Color imagery ; Crowd monitoring ; Data mining ; Decoding ; Feature extraction ; Fuses ; Image segmentation ; Multi-modality data fusion ; Object detection ; Object recognition ; Perception ; RGB-Thermal fusion ; RGB-thermal perception ; Salience ; Semantic segmentation ; Task analysis</subject><ispartof>IEEE robotics and automation letters, 2023-07, Vol.8 (7), p.1-8</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</citedby><cites>FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</cites><orcidid>0000-0002-6363-1446 ; 0000-0002-1911-4361 ; 0000-0002-7071-7184 ; 0000-0002-8098-9095</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10113725$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10113725$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liang, Mingjian</creatorcontrib><creatorcontrib>Hu, Junjie</creatorcontrib><creatorcontrib>Bao, Chenyu</creatorcontrib><creatorcontrib>Feng, Hua</creatorcontrib><creatorcontrib>Deng, Fuqin</creatorcontrib><creatorcontrib>Lam, Tin Lun</creatorcontrib><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</description><subject>Color imagery</subject><subject>Crowd monitoring</subject><subject>Data mining</subject><subject>Decoding</subject><subject>Feature extraction</subject><subject>Fuses</subject><subject>Image segmentation</subject><subject>Multi-modality data fusion</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Perception</subject><subject>RGB-Thermal fusion</subject><subject>RGB-thermal perception</subject><subject>Salience</subject><subject>Semantic segmentation</subject><subject>Task analysis</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1rAjEQhkNpoWK999DDQs9rk5lkszlaUVsQWsSeQ8wHrtXdbbJC---7ogdP8w487ww8hDwyOmaMqpflajIGCjhGkACFuiEDQClzlEVxe5XvySilHaWUCZCoxIBMZ7_tvrJVl026ztdd1dT5rN6a2nqXzY-p37PQxGy1eM3XWx8PZp99-mh9e0KztUnf6YHcBbNPfnSZQ_I1n62nb_nyY_E-nSxzCwq6nBvuyg2lVijhmGVSIMXCMeToLTJeSig2IQSnnATOjaMObBBSGNzQwnkckufz3TY2P0efOr1rjrHuX2ooAZniUMqeomfKxial6INuY3Uw8U8zqk-2dG9Ln2zpi62-8nSuVN77K5wxlCDwH93-ZFE</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Liang, Mingjian</creator><creator>Hu, Junjie</creator><creator>Bao, Chenyu</creator><creator>Feng, Hua</creator><creator>Deng, Fuqin</creator><creator>Lam, Tin Lun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-6363-1446</orcidid><orcidid>https://orcid.org/0000-0002-1911-4361</orcidid><orcidid>https://orcid.org/0000-0002-7071-7184</orcidid><orcidid>https://orcid.org/0000-0002-8098-9095</orcidid></search><sort><creationdate>20230701</creationdate><title>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</title><author>Liang, Mingjian ; Hu, Junjie ; Bao, Chenyu ; Feng, Hua ; Deng, Fuqin ; Lam, Tin Lun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-4a4d8b00c595d1c1753036d1343ec3148726bfffd9d7244ad0d2cf575a3b06de3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Color imagery</topic><topic>Crowd monitoring</topic><topic>Data mining</topic><topic>Decoding</topic><topic>Feature extraction</topic><topic>Fuses</topic><topic>Image segmentation</topic><topic>Multi-modality data fusion</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Perception</topic><topic>RGB-Thermal fusion</topic><topic>RGB-thermal perception</topic><topic>Salience</topic><topic>Semantic segmentation</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liang, Mingjian</creatorcontrib><creatorcontrib>Hu, Junjie</creatorcontrib><creatorcontrib>Bao, Chenyu</creatorcontrib><creatorcontrib>Feng, Hua</creatorcontrib><creatorcontrib>Deng, Fuqin</creatorcontrib><creatorcontrib>Lam, Tin Lun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liang, Mingjian</au><au>Hu, Junjie</au><au>Bao, Chenyu</au><au>Feng, Hua</au><au>Deng, Fuqin</au><au>Lam, Tin Lun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2023-07-01</date><risdate>2023</risdate><volume>8</volume><issue>7</issue><spage>1</spage><epage>8</epage><pages>1-8</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality features inside models, or applying attention to each data modality. These fusion strategies are straightforward yet insufficient. In this paper, we propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data. Specifically, we consider the following cases: i) both RGB data and thermal data, ii) only one of the types of data, and iii) none of them generate discriminative features. EAEF uses one branch to enhance feature extraction for i) and iii) and the other branch to remedy insufficient representations for ii). The outputs of two branches are fused to form complementary features. As a result, the proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting. The code is available at https://github.com/FreeformRobotics/EAEFNet .</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2023.3272269</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-6363-1446</orcidid><orcidid>https://orcid.org/0000-0002-1911-4361</orcidid><orcidid>https://orcid.org/0000-0002-7071-7184</orcidid><orcidid>https://orcid.org/0000-0002-8098-9095</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2377-3766 |
ispartof | IEEE robotics and automation letters, 2023-07, Vol.8 (7), p.1-8 |
issn | 2377-3766 2377-3766 |
language | eng |
recordid | cdi_proquest_journals_2823194287 |
source | IEEE Electronic Library (IEL) |
subjects | Color imagery Crowd monitoring Data mining Decoding Feature extraction Fuses Image segmentation Multi-modality data fusion Object detection Object recognition Perception RGB-Thermal fusion RGB-thermal perception Salience Semantic segmentation Task analysis |
title | Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T22%3A04%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Explicit%20Attention-Enhanced%20Fusion%20for%20RGB-Thermal%20Perception%20Tasks&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Liang,%20Mingjian&rft.date=2023-07-01&rft.volume=8&rft.issue=7&rft.spage=1&rft.epage=8&rft.pages=1-8&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2023.3272269&rft_dat=%3Cproquest_RIE%3E2823194287%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823194287&rft_id=info:pmid/&rft_ieee_id=10113725&rfr_iscdi=true |