HEFANet: hierarchical efficient fusion and aggregation segmentation network for enhanced rgb-thermal urban scene parsing

RGB-Thermal semantic segmentation is important in widespread applications in adverse illumination conditions, such as autonomous driving and robotic sensing. However, most existing methods ignore the feature differences between the two modalities and do not effectively exploit and handle the feature...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-11, Vol.54 (22), p.11248-11266
Hauptverfasser:	Shen, Zhengwen, Pan, Zaiyu, Weng, Yuchen, Li, Yulian, Wang, Jiangyu, Wang, Jun
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computer Science Machines Manufacturing Mechanical Engineering Modules Processes Robot sensors Semantic segmentation Semantics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	11266
container_issue	22
container_start_page	11248
container_title	Applied intelligence (Dordrecht, Netherlands)
container_volume	54
creator	Shen, Zhengwen Pan, Zaiyu Weng, Yuchen Li, Yulian Wang, Jiangyu Wang, Jun
description	RGB-Thermal semantic segmentation is important in widespread applications in adverse illumination conditions, such as autonomous driving and robotic sensing. However, most existing methods ignore the feature differences between the two modalities and do not effectively exploit and handle the features at different levels. In this paper, we present a novel multimodal feature fusion network named HEFANet, which effectively enhances the interaction and fusion of features. Concretely, we propose a Cross-layer and Cross-modal Feature Descriptor module (CCFD) to mitigate differences between different multimodal data and to mine the valuable and correlated features of cross-layers. To effectively fuse multimodal features at different levels, we propose a Multi-modal Interleaved Sparse Self-Attention module (MISSA) to aggregate rich spatial semantic information in the earlier layers. Then, we propose the Spatial Interaction and Channel Selection module (SICS) in the last layer to enhance the representation of rich contextual features and highlight important information by channel communication interactions for optimal sparse feature aggregation selectively. Extensive experiments were carried out on three publicly available datasets (MFNet, PST900, and FMB), and achieved new state-of-the-art results. The code and results are available at https://github.com/shenzw21/HEFANet .
doi_str_mv	10.1007/s10489-024-05743-0
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3106536726</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3106536726</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-dafa5a45e2f41f6ef09d3de66a0a003e473af498591028e342c2ba4e4e415b893</originalsourceid><addsrcrecordid>eNp9UE1Lw0AQXUTB-vEHPC14js5mN0njrZTWCkUvCt6WSTKbpLabupug_nu3RvAmcxge72OYx9iVgBsBkN16AWqaRxCrCJJMyQiO2EQkmYwylWfHbAJ5oNI0fz1lZ95vAEBKEBP2uVosZ4_U3_GmJYeubNoSt5yMacuWbM_N4NvOcrQVx7p2VGN_wJ7qXaBHYKn_6NwbN53jZBu0JVXc1UXUN-R2IW5wBQZPSZb4Hp1vbX3BTgxuPV3-7nP2slw8z1fR-un-YT5bR2UM0EcVGkxQJRQbJUxKBvJKVpSmCBh-IJVJNCqfJrmAeEpSxWVcoKIwIimmuTxn12Pu3nXvA_leb7rB2XBSSwFpItMsToMqHlWl67x3ZPTetTt0X1qAPjSsx4Z1aFj_NKwhmORo8kFsa3J_0f-4vgHvkoBV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3106536726</pqid></control><display><type>article</type><title>HEFANet: hierarchical efficient fusion and aggregation segmentation network for enhanced rgb-thermal urban scene parsing</title><source>SpringerLink Journals - AutoHoldings</source><creator>Shen, Zhengwen ; Pan, Zaiyu ; Weng, Yuchen ; Li, Yulian ; Wang, Jiangyu ; Wang, Jun</creator><creatorcontrib>Shen, Zhengwen ; Pan, Zaiyu ; Weng, Yuchen ; Li, Yulian ; Wang, Jiangyu ; Wang, Jun</creatorcontrib><description>RGB-Thermal semantic segmentation is important in widespread applications in adverse illumination conditions, such as autonomous driving and robotic sensing. However, most existing methods ignore the feature differences between the two modalities and do not effectively exploit and handle the features at different levels. In this paper, we present a novel multimodal feature fusion network named HEFANet, which effectively enhances the interaction and fusion of features. Concretely, we propose a Cross-layer and Cross-modal Feature Descriptor module (CCFD) to mitigate differences between different multimodal data and to mine the valuable and correlated features of cross-layers. To effectively fuse multimodal features at different levels, we propose a Multi-modal Interleaved Sparse Self-Attention module (MISSA) to aggregate rich spatial semantic information in the earlier layers. Then, we propose the Spatial Interaction and Channel Selection module (SICS) in the last layer to enhance the representation of rich contextual features and highlight important information by channel communication interactions for optimal sparse feature aggregation selectively. Extensive experiments were carried out on three publicly available datasets (MFNet, PST900, and FMB), and achieved new state-of-the-art results. The code and results are available at https://github.com/shenzw21/HEFANet .</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05743-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Computer Science ; Machines ; Manufacturing ; Mechanical Engineering ; Modules ; Processes ; Robot sensors ; Semantic segmentation ; Semantics</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-11, Vol.54 (22), p.11248-11266</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-dafa5a45e2f41f6ef09d3de66a0a003e473af498591028e342c2ba4e4e415b893</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05743-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05743-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Shen, Zhengwen</creatorcontrib><creatorcontrib>Pan, Zaiyu</creatorcontrib><creatorcontrib>Weng, Yuchen</creatorcontrib><creatorcontrib>Li, Yulian</creatorcontrib><creatorcontrib>Wang, Jiangyu</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><title>HEFANet: hierarchical efficient fusion and aggregation segmentation network for enhanced rgb-thermal urban scene parsing</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>RGB-Thermal semantic segmentation is important in widespread applications in adverse illumination conditions, such as autonomous driving and robotic sensing. However, most existing methods ignore the feature differences between the two modalities and do not effectively exploit and handle the features at different levels. In this paper, we present a novel multimodal feature fusion network named HEFANet, which effectively enhances the interaction and fusion of features. Concretely, we propose a Cross-layer and Cross-modal Feature Descriptor module (CCFD) to mitigate differences between different multimodal data and to mine the valuable and correlated features of cross-layers. To effectively fuse multimodal features at different levels, we propose a Multi-modal Interleaved Sparse Self-Attention module (MISSA) to aggregate rich spatial semantic information in the earlier layers. Then, we propose the Spatial Interaction and Channel Selection module (SICS) in the last layer to enhance the representation of rich contextual features and highlight important information by channel communication interactions for optimal sparse feature aggregation selectively. Extensive experiments were carried out on three publicly available datasets (MFNet, PST900, and FMB), and achieved new state-of-the-art results. The code and results are available at https://github.com/shenzw21/HEFANet .</description><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Modules</subject><subject>Processes</subject><subject>Robot sensors</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9UE1Lw0AQXUTB-vEHPC14js5mN0njrZTWCkUvCt6WSTKbpLabupug_nu3RvAmcxge72OYx9iVgBsBkN16AWqaRxCrCJJMyQiO2EQkmYwylWfHbAJ5oNI0fz1lZ95vAEBKEBP2uVosZ4_U3_GmJYeubNoSt5yMacuWbM_N4NvOcrQVx7p2VGN_wJ7qXaBHYKn_6NwbN53jZBu0JVXc1UXUN-R2IW5wBQZPSZb4Hp1vbX3BTgxuPV3-7nP2slw8z1fR-un-YT5bR2UM0EcVGkxQJRQbJUxKBvJKVpSmCBh-IJVJNCqfJrmAeEpSxWVcoKIwIimmuTxn12Pu3nXvA_leb7rB2XBSSwFpItMsToMqHlWl67x3ZPTetTt0X1qAPjSsx4Z1aFj_NKwhmORo8kFsa3J_0f-4vgHvkoBV</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Shen, Zhengwen</creator><creator>Pan, Zaiyu</creator><creator>Weng, Yuchen</creator><creator>Li, Yulian</creator><creator>Wang, Jiangyu</creator><creator>Wang, Jun</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20241101</creationdate><title>HEFANet: hierarchical efficient fusion and aggregation segmentation network for enhanced rgb-thermal urban scene parsing</title><author>Shen, Zhengwen ; Pan, Zaiyu ; Weng, Yuchen ; Li, Yulian ; Wang, Jiangyu ; Wang, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-dafa5a45e2f41f6ef09d3de66a0a003e473af498591028e342c2ba4e4e415b893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Modules</topic><topic>Processes</topic><topic>Robot sensors</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shen, Zhengwen</creatorcontrib><creatorcontrib>Pan, Zaiyu</creatorcontrib><creatorcontrib>Weng, Yuchen</creatorcontrib><creatorcontrib>Li, Yulian</creatorcontrib><creatorcontrib>Wang, Jiangyu</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shen, Zhengwen</au><au>Pan, Zaiyu</au><au>Weng, Yuchen</au><au>Li, Yulian</au><au>Wang, Jiangyu</au><au>Wang, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HEFANet: hierarchical efficient fusion and aggregation segmentation network for enhanced rgb-thermal urban scene parsing</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>54</volume><issue>22</issue><spage>11248</spage><epage>11266</epage><pages>11248-11266</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>RGB-Thermal semantic segmentation is important in widespread applications in adverse illumination conditions, such as autonomous driving and robotic sensing. However, most existing methods ignore the feature differences between the two modalities and do not effectively exploit and handle the features at different levels. In this paper, we present a novel multimodal feature fusion network named HEFANet, which effectively enhances the interaction and fusion of features. Concretely, we propose a Cross-layer and Cross-modal Feature Descriptor module (CCFD) to mitigate differences between different multimodal data and to mine the valuable and correlated features of cross-layers. To effectively fuse multimodal features at different levels, we propose a Multi-modal Interleaved Sparse Self-Attention module (MISSA) to aggregate rich spatial semantic information in the earlier layers. Then, we propose the Spatial Interaction and Channel Selection module (SICS) in the last layer to enhance the representation of rich contextual features and highlight important information by channel communication interactions for optimal sparse feature aggregation selectively. Extensive experiments were carried out on three publicly available datasets (MFNet, PST900, and FMB), and achieved new state-of-the-art results. The code and results are available at https://github.com/shenzw21/HEFANet .</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05743-0</doi><tpages>19</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0924-669X
ispartof	Applied intelligence (Dordrecht, Netherlands), 2024-11, Vol.54 (22), p.11248-11266
issn	0924-669X 1573-7497
language	eng
recordid	cdi_proquest_journals_3106536726
source	SpringerLink Journals - AutoHoldings
subjects	Artificial Intelligence Computer Science Machines Manufacturing Mechanical Engineering Modules Processes Robot sensors Semantic segmentation Semantics
title	HEFANet: hierarchical efficient fusion and aggregation segmentation network for enhanced rgb-thermal urban scene parsing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T11%3A21%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HEFANet:%20hierarchical%20efficient%20fusion%20and%20aggregation%20segmentation%20network%20for%20enhanced%20rgb-thermal%20urban%20scene%20parsing&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Shen,%20Zhengwen&rft.date=2024-11-01&rft.volume=54&rft.issue=22&rft.spage=11248&rft.epage=11266&rft.pages=11248-11266&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05743-0&rft_dat=%3Cproquest_cross%3E3106536726%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3106536726&rft_id=info:pmid/&rfr_iscdi=true