Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification
When applied across different scenes, hyperspectral image (HSI) classification models often struggle to generalize due to the data distribution disparities and labels' scarcity, leading to domain shift (DS) problems. Recently, the high-level semantics from text has demonstrated the potential to...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-14 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 14 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on geoscience and remote sensing |
container_volume | 62 |
creator | Wang, Xusheng Dong, Shoubin Zheng, Xiaorou Lu, Runuo Jia, Jianxin |
description | When applied across different scenes, hyperspectral image (HSI) classification models often struggle to generalize due to the data distribution disparities and labels' scarcity, leading to domain shift (DS) problems. Recently, the high-level semantics from text has demonstrated the potential to address the DS problem, by improving the generalization capability of image encoders through aligning image-text pairs. However, the main challenge still lies in crafting appropriate texts that accurately represent the intricate interrelationships and the fragmented nature of land cover in HSIs and effectively extracting spectral-spatial features from HSI data. This article proposes a domain generalization (DG) method, EHSnet, to address these issues by leveraging multilayered explicit high-level semantic (EHS) information from different types of texts to provide precisely relevant semantic information for the image encoder. A multilayered EHS information paradigm is well-defined, aiming to extract the HSI's intricate interrelationships and the fragmented land-cover features, and a dual-residual encoder connected by a 2-D convolution is designed, which combines CNNs with residual structure and Vision Transformers (ViTs) with short-range cross-layer connections to explore the spectral-spatial features of HSIs. By aligning text features with image features in the semantic space, EHSnet improves the representation capability of the image encoder and is endowed with zero-shot generalization ability for cross-scene tasks. Extensive experiments conducted on three hyperspectral datasets, including Houston, Pavia, and XS datasets, validate the effectiveness and superiority of EHSnet, with the Kappa coefficient improved by 8.17%, 3.22%, and 3.62% across three datasets compared to the state-of-the-art (SOTA) methods. The code is available at https://github.com/SCUT-CCNL/EHSnet . |
doi_str_mv | 10.1109/TGRS.2024.3495765 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3133488689</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10750220</ieee_id><sourcerecordid>3133488689</sourcerecordid><originalsourceid>FETCH-LOGICAL-c919-b4eba9d40772727336fb00fee1d33014513de0bf54db939fca1e3edc31c6bf1e3</originalsourceid><addsrcrecordid>eNpNkFtLAzEQhYMoWKs_QPAh4PPWZJO95FFqbQtFwfY9ZLOTmro3k61af73p5UHmYYbDOTPDh9AtJSNKiXhYTd-Wo5jEfMS4SLI0OUMDmiR5RFLOz9GAUJFGcS7iS3Tl_YYQyhOaDVA5-ekqq22PZ3b9Hi3gCyq8hFo1vdX4Bfrv1n1g0zr81NbKNngKDThV2V_V27bBQZntOnC-A90HHc9rtQY8rpT31lh9cF2jC6MqDzenPkSr58lqPIsWr9P5-HERaUFFVHAolCg5ybI4FGOpKQgxALRk7PAvK4EUJuFlIZgwWlFgUGpGdVqYMA_R_XFt59rPLfhebtqta8JFyShjPM_TXAQXPbq0a713YGTnbK3cTlIi9yzlnqXcs5QnliFzd8xYAPjnzxISx4T9Acg7cak</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3133488689</pqid></control><display><type>article</type><title>Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Xusheng ; Dong, Shoubin ; Zheng, Xiaorou ; Lu, Runuo ; Jia, Jianxin</creator><creatorcontrib>Wang, Xusheng ; Dong, Shoubin ; Zheng, Xiaorou ; Lu, Runuo ; Jia, Jianxin</creatorcontrib><description>When applied across different scenes, hyperspectral image (HSI) classification models often struggle to generalize due to the data distribution disparities and labels' scarcity, leading to domain shift (DS) problems. Recently, the high-level semantics from text has demonstrated the potential to address the DS problem, by improving the generalization capability of image encoders through aligning image-text pairs. However, the main challenge still lies in crafting appropriate texts that accurately represent the intricate interrelationships and the fragmented nature of land cover in HSIs and effectively extracting spectral-spatial features from HSI data. This article proposes a domain generalization (DG) method, EHSnet, to address these issues by leveraging multilayered explicit high-level semantic (EHS) information from different types of texts to provide precisely relevant semantic information for the image encoder. A multilayered EHS information paradigm is well-defined, aiming to extract the HSI's intricate interrelationships and the fragmented land-cover features, and a dual-residual encoder connected by a 2-D convolution is designed, which combines CNNs with residual structure and Vision Transformers (ViTs) with short-range cross-layer connections to explore the spectral-spatial features of HSIs. By aligning text features with image features in the semantic space, EHSnet improves the representation capability of the image encoder and is endowed with zero-shot generalization ability for cross-scene tasks. Extensive experiments conducted on three hyperspectral datasets, including Houston, Pavia, and XS datasets, validate the effectiveness and superiority of EHSnet, with the Kappa coefficient improved by 8.17%, 3.22%, and 3.62% across three datasets compared to the state-of-the-art (SOTA) methods. The code is available at https://github.com/SCUT-CCNL/EHSnet .</description><identifier>ISSN: 0196-2892</identifier><identifier>EISSN: 1558-0644</identifier><identifier>DOI: 10.1109/TGRS.2024.3495765</identifier><identifier>CODEN: IGRSD2</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Classification ; Coders ; Contrastive learning ; Data mining ; Data models ; Datasets ; Domain generalization (DG) ; Feature extraction ; hyperspectral image (HSI) classification ; Hyperspectral imaging ; Image classification ; Image processing ; Information processing ; Land cover ; Land surface ; multiple modality ; Representation learning ; Semantics ; Spatial data ; Texts ; Training ; Visualization ; visual–language model</subject><ispartof>IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-14</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c919-b4eba9d40772727336fb00fee1d33014513de0bf54db939fca1e3edc31c6bf1e3</cites><orcidid>0000-0002-4941-5907 ; 0009-0008-5429-3984 ; 0000-0003-0153-850X ; 0000-0003-4366-4547</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10750220$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,4010,27904,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10750220$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wang, Xusheng</creatorcontrib><creatorcontrib>Dong, Shoubin</creatorcontrib><creatorcontrib>Zheng, Xiaorou</creatorcontrib><creatorcontrib>Lu, Runuo</creatorcontrib><creatorcontrib>Jia, Jianxin</creatorcontrib><title>Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification</title><title>IEEE transactions on geoscience and remote sensing</title><addtitle>TGRS</addtitle><description>When applied across different scenes, hyperspectral image (HSI) classification models often struggle to generalize due to the data distribution disparities and labels' scarcity, leading to domain shift (DS) problems. Recently, the high-level semantics from text has demonstrated the potential to address the DS problem, by improving the generalization capability of image encoders through aligning image-text pairs. However, the main challenge still lies in crafting appropriate texts that accurately represent the intricate interrelationships and the fragmented nature of land cover in HSIs and effectively extracting spectral-spatial features from HSI data. This article proposes a domain generalization (DG) method, EHSnet, to address these issues by leveraging multilayered explicit high-level semantic (EHS) information from different types of texts to provide precisely relevant semantic information for the image encoder. A multilayered EHS information paradigm is well-defined, aiming to extract the HSI's intricate interrelationships and the fragmented land-cover features, and a dual-residual encoder connected by a 2-D convolution is designed, which combines CNNs with residual structure and Vision Transformers (ViTs) with short-range cross-layer connections to explore the spectral-spatial features of HSIs. By aligning text features with image features in the semantic space, EHSnet improves the representation capability of the image encoder and is endowed with zero-shot generalization ability for cross-scene tasks. Extensive experiments conducted on three hyperspectral datasets, including Houston, Pavia, and XS datasets, validate the effectiveness and superiority of EHSnet, with the Kappa coefficient improved by 8.17%, 3.22%, and 3.62% across three datasets compared to the state-of-the-art (SOTA) methods. The code is available at https://github.com/SCUT-CCNL/EHSnet .</description><subject>Classification</subject><subject>Coders</subject><subject>Contrastive learning</subject><subject>Data mining</subject><subject>Data models</subject><subject>Datasets</subject><subject>Domain generalization (DG)</subject><subject>Feature extraction</subject><subject>hyperspectral image (HSI) classification</subject><subject>Hyperspectral imaging</subject><subject>Image classification</subject><subject>Image processing</subject><subject>Information processing</subject><subject>Land cover</subject><subject>Land surface</subject><subject>multiple modality</subject><subject>Representation learning</subject><subject>Semantics</subject><subject>Spatial data</subject><subject>Texts</subject><subject>Training</subject><subject>Visualization</subject><subject>visual–language model</subject><issn>0196-2892</issn><issn>1558-0644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkFtLAzEQhYMoWKs_QPAh4PPWZJO95FFqbQtFwfY9ZLOTmro3k61af73p5UHmYYbDOTPDh9AtJSNKiXhYTd-Wo5jEfMS4SLI0OUMDmiR5RFLOz9GAUJFGcS7iS3Tl_YYQyhOaDVA5-ekqq22PZ3b9Hi3gCyq8hFo1vdX4Bfrv1n1g0zr81NbKNngKDThV2V_V27bBQZntOnC-A90HHc9rtQY8rpT31lh9cF2jC6MqDzenPkSr58lqPIsWr9P5-HERaUFFVHAolCg5ybI4FGOpKQgxALRk7PAvK4EUJuFlIZgwWlFgUGpGdVqYMA_R_XFt59rPLfhebtqta8JFyShjPM_TXAQXPbq0a713YGTnbK3cTlIi9yzlnqXcs5QnliFzd8xYAPjnzxISx4T9Acg7cak</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Wang, Xusheng</creator><creator>Dong, Shoubin</creator><creator>Zheng, Xiaorou</creator><creator>Lu, Runuo</creator><creator>Jia, Jianxin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-4941-5907</orcidid><orcidid>https://orcid.org/0009-0008-5429-3984</orcidid><orcidid>https://orcid.org/0000-0003-0153-850X</orcidid><orcidid>https://orcid.org/0000-0003-4366-4547</orcidid></search><sort><creationdate>2024</creationdate><title>Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification</title><author>Wang, Xusheng ; Dong, Shoubin ; Zheng, Xiaorou ; Lu, Runuo ; Jia, Jianxin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c919-b4eba9d40772727336fb00fee1d33014513de0bf54db939fca1e3edc31c6bf1e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Coders</topic><topic>Contrastive learning</topic><topic>Data mining</topic><topic>Data models</topic><topic>Datasets</topic><topic>Domain generalization (DG)</topic><topic>Feature extraction</topic><topic>hyperspectral image (HSI) classification</topic><topic>Hyperspectral imaging</topic><topic>Image classification</topic><topic>Image processing</topic><topic>Information processing</topic><topic>Land cover</topic><topic>Land surface</topic><topic>multiple modality</topic><topic>Representation learning</topic><topic>Semantics</topic><topic>Spatial data</topic><topic>Texts</topic><topic>Training</topic><topic>Visualization</topic><topic>visual–language model</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xusheng</creatorcontrib><creatorcontrib>Dong, Shoubin</creatorcontrib><creatorcontrib>Zheng, Xiaorou</creatorcontrib><creatorcontrib>Lu, Runuo</creatorcontrib><creatorcontrib>Jia, Jianxin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on geoscience and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Xusheng</au><au>Dong, Shoubin</au><au>Zheng, Xiaorou</au><au>Lu, Runuo</au><au>Jia, Jianxin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification</atitle><jtitle>IEEE transactions on geoscience and remote sensing</jtitle><stitle>TGRS</stitle><date>2024</date><risdate>2024</risdate><volume>62</volume><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>0196-2892</issn><eissn>1558-0644</eissn><coden>IGRSD2</coden><abstract>When applied across different scenes, hyperspectral image (HSI) classification models often struggle to generalize due to the data distribution disparities and labels' scarcity, leading to domain shift (DS) problems. Recently, the high-level semantics from text has demonstrated the potential to address the DS problem, by improving the generalization capability of image encoders through aligning image-text pairs. However, the main challenge still lies in crafting appropriate texts that accurately represent the intricate interrelationships and the fragmented nature of land cover in HSIs and effectively extracting spectral-spatial features from HSI data. This article proposes a domain generalization (DG) method, EHSnet, to address these issues by leveraging multilayered explicit high-level semantic (EHS) information from different types of texts to provide precisely relevant semantic information for the image encoder. A multilayered EHS information paradigm is well-defined, aiming to extract the HSI's intricate interrelationships and the fragmented land-cover features, and a dual-residual encoder connected by a 2-D convolution is designed, which combines CNNs with residual structure and Vision Transformers (ViTs) with short-range cross-layer connections to explore the spectral-spatial features of HSIs. By aligning text features with image features in the semantic space, EHSnet improves the representation capability of the image encoder and is endowed with zero-shot generalization ability for cross-scene tasks. Extensive experiments conducted on three hyperspectral datasets, including Houston, Pavia, and XS datasets, validate the effectiveness and superiority of EHSnet, with the Kappa coefficient improved by 8.17%, 3.22%, and 3.62% across three datasets compared to the state-of-the-art (SOTA) methods. The code is available at https://github.com/SCUT-CCNL/EHSnet .</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TGRS.2024.3495765</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-4941-5907</orcidid><orcidid>https://orcid.org/0009-0008-5429-3984</orcidid><orcidid>https://orcid.org/0000-0003-0153-850X</orcidid><orcidid>https://orcid.org/0000-0003-4366-4547</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0196-2892 |
ispartof | IEEE transactions on geoscience and remote sensing, 2024, Vol.62, p.1-14 |
issn | 0196-2892 1558-0644 |
language | eng |
recordid | cdi_proquest_journals_3133488689 |
source | IEEE Electronic Library (IEL) |
subjects | Classification Coders Contrastive learning Data mining Data models Datasets Domain generalization (DG) Feature extraction hyperspectral image (HSI) classification Hyperspectral imaging Image classification Image processing Information processing Land cover Land surface multiple modality Representation learning Semantics Spatial data Texts Training Visualization visual–language model |
title | Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T19%3A35%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Explicit%20High-Level%20Semantic%20Network%20for%20Domain%20Generalization%20in%20Hyperspectral%20Image%20Classification&rft.jtitle=IEEE%20transactions%20on%20geoscience%20and%20remote%20sensing&rft.au=Wang,%20Xusheng&rft.date=2024&rft.volume=62&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=0196-2892&rft.eissn=1558-0644&rft.coden=IGRSD2&rft_id=info:doi/10.1109/TGRS.2024.3495765&rft_dat=%3Cproquest_RIE%3E3133488689%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3133488689&rft_id=info:pmid/&rft_ieee_id=10750220&rfr_iscdi=true |