S2HM2: A Spectral-Spatial Hierarchical Masked Modeling Framework for Self-Supervised Feature Learning and Classification of Large-Scale Hyperspectral Images

Most of the existing deep learning-based hyperspectral image (HSI) classification algorithms are based on supervised learning, where a large number of annotated labels with high acquisition cost are required. Self-supervised learning (SSL) methods can learn abundant representations using a large amo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-19
Hauptverfasser: Tu, Lilin, Li, Jiayi, Huang, Xin, Gong, Jianya, Xie, Xing, Wang, Leiguang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most of the existing deep learning-based hyperspectral image (HSI) classification algorithms are based on supervised learning, where a large number of annotated labels with high acquisition cost are required. Self-supervised learning (SSL) methods can learn abundant representations using a large amount of unlabeled data, thereby reducing the reliability of labels. In particular, SSL based on masked image modeling (MIM) can extract fine-grained features, which is well suited for HSI classification as a pixel-level interpretation task. However, MIM has scarcely been investigated in the HSI classification. Current algorithms lack a comprehensive consideration of the multiscale spectral-spatial characteristics of HSI when constructing the pretraining task, and there exists high computational cost and redundancy when applied to large-scale HSIs. Therefore, this article develops an SSL framework based on spectral-spatial hierarchical masked modeling (S2HM2) for large-scale HSI classification. Considering the spectral-spatial characteristics of HSI, 3-D masking strategy and spectral-spatial consistency loss are proposed to construct the MIM task. To fully exploit features at each scale, hierarchical 3-D feature pyramid network (3D-FPN) is designed as decoder for both pretext and downstream tasks in a "pixel-to-pixel" manner. In addition, multiscale masked feature modeling (MS-MFM) task is proposed to further facilitate the multiscale feature learning. The SSL pretraining is guided by both MIM and MS-MFM. The experimental results on two large-scale hyperspectral datasets, i.e., WHU-OHS and WHU-H2SR, demonstrate the superiority of the proposed method. Furthermore, transfer learning experiments are conducted on a variety of hyperspectral datasets, where classification accuracies are boosted in most of the scenarios. The source code will be made available at https://github.com/tulilin/S2HM2 .
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3392962