Vision Transformer-Based Ensemble Learning for Hyperspectral Image Classification

Hyperspectral image (HSI) classification, due to its characteristic combination of images and spectra, has important applications in various fields through pixel-level image classification. The fusion of spatial–spectral features is a topic of great interest in the context of hyperspectral image cla...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Remote sensing (Basel, Switzerland) Switzerland), 2023-11, Vol.15 (21), p.5208
Hauptverfasser:	Liu, Jun, Guo, Haoran, He, Yile, Li, Huali
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Classification Computational linguistics Computer vision Deep learning Discriminant analysis Ensemble learning hyperspectral image classification Hyperspectral imaging Image classification Language processing Learning Machine learning Machine vision Medical screening Natural language interfaces Neural networks Pixels Remote sensing Spatial distribution spatial shuffle Support vector machines Training vision transformer Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Hyperspectral image (HSI) classification, due to its characteristic combination of images and spectra, has important applications in various fields through pixel-level image classification. The fusion of spatial–spectral features is a topic of great interest in the context of hyperspectral image classification, which typically requires selecting a larger spatial neighborhood window, potentially leading to overlaps between training and testing samples. Vision Transformer (ViTs), with their powerful global modeling abilities, have had a significant impact in the field of computer vision through various variants. In this study, an ensemble learning framework for HSI classification is proposed by integrating multiple variants of ViTs, achieving high-precision pixel-level classification. Firstly, the spatial shuffle operation was introduced to preprocess the training samples for HSI classification. By randomly shuffling operations using smaller spatial neighborhood windows, a greater potential spatial distribution of pixels can be described. Then, the training samples were transformed from a 3D cube to a 2D image, and a learning framework was built by integrating seven ViT variants. Finally, a two-level ensemble strategy was employed to achieve pixel-level classification based on the results of multiple ViT variants. Our experimental results demonstrate that the proposed ensemble learning framework achieves stable and significantly high classification accuracy on multiple publicly available HSI datasets. The proposed method also shows notable classification performance with varying numbers of training samples. Moreover, herein, it is proven that the spatial shuffle operation plays a crucial role in improving classification accuracy. By introducing superior individual classifiers, the proposed ensemble framework is expected to achieve even better classification performance.
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs15215208