S2HM2: A Spectral-Spatial Hierarchical Masked Modeling Framework for Self-Supervised Feature Learning and Classification of Large-Scale Hyperspectral Images
Most of the existing deep learning-based hyperspectral image (HSI) classification algorithms are based on supervised learning, where a large number of annotated labels with high acquisition cost are required. Self-supervised learning (SSL) methods can learn abundant representations using a large amo...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-19 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Most of the existing deep learning-based hyperspectral image (HSI) classification algorithms are based on supervised learning, where a large number of annotated labels with high acquisition cost are required. Self-supervised learning (SSL) methods can learn abundant representations using a large amount of unlabeled data, thereby reducing the reliability of labels. In particular, SSL based on masked image modeling (MIM) can extract fine-grained features, which is well suited for HSI classification as a pixel-level interpretation task. However, MIM has scarcely been investigated in the HSI classification. Current algorithms lack a comprehensive consideration of the multiscale spectral-spatial characteristics of HSI when constructing the pretraining task, and there exists high computational cost and redundancy when applied to large-scale HSIs. Therefore, this article develops an SSL framework based on spectral-spatial hierarchical masked modeling (S2HM2) for large-scale HSI classification. Considering the spectral-spatial characteristics of HSI, 3-D masking strategy and spectral-spatial consistency loss are proposed to construct the MIM task. To fully exploit features at each scale, hierarchical 3-D feature pyramid network (3D-FPN) is designed as decoder for both pretext and downstream tasks in a "pixel-to-pixel" manner. In addition, multiscale masked feature modeling (MS-MFM) task is proposed to further facilitate the multiscale feature learning. The SSL pretraining is guided by both MIM and MS-MFM. The experimental results on two large-scale hyperspectral datasets, i.e., WHU-OHS and WHU-H2SR, demonstrate the superiority of the proposed method. Furthermore, transfer learning experiments are conducted on a variety of hyperspectral datasets, where classification accuracies are boosted in most of the scenarios. The source code will be made available at https://github.com/tulilin/S2HM2 . |
---|---|
ISSN: | 0196-2892 1558-0644 |
DOI: | 10.1109/TGRS.2024.3392962 |