Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification

When applied across different scenes, hyperspectral image (HSI) classification models often struggle to generalize due to the data distribution disparities and labels' scarcity, leading to domain shift (DS) problems. Recently, the high-level semantics from text has demonstrated the potential to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-14
Hauptverfasser: Wang, Xusheng, Dong, Shoubin, Zheng, Xiaorou, Lu, Runuo, Jia, Jianxin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When applied across different scenes, hyperspectral image (HSI) classification models often struggle to generalize due to the data distribution disparities and labels' scarcity, leading to domain shift (DS) problems. Recently, the high-level semantics from text has demonstrated the potential to address the DS problem, by improving the generalization capability of image encoders through aligning image-text pairs. However, the main challenge still lies in crafting appropriate texts that accurately represent the intricate interrelationships and the fragmented nature of land cover in HSIs and effectively extracting spectral-spatial features from HSI data. This article proposes a domain generalization (DG) method, EHSnet, to address these issues by leveraging multilayered explicit high-level semantic (EHS) information from different types of texts to provide precisely relevant semantic information for the image encoder. A multilayered EHS information paradigm is well-defined, aiming to extract the HSI's intricate interrelationships and the fragmented land-cover features, and a dual-residual encoder connected by a 2-D convolution is designed, which combines CNNs with residual structure and Vision Transformers (ViTs) with short-range cross-layer connections to explore the spectral-spatial features of HSIs. By aligning text features with image features in the semantic space, EHSnet improves the representation capability of the image encoder and is endowed with zero-shot generalization ability for cross-scene tasks. Extensive experiments conducted on three hyperspectral datasets, including Houston, Pavia, and XS datasets, validate the effectiveness and superiority of EHSnet, with the Kappa coefficient improved by 8.17%, 3.22%, and 3.62% across three datasets compared to the state-of-the-art (SOTA) methods. The code is available at https://github.com/SCUT-CCNL/EHSnet .
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3495765