Spatial-Spectral Transformer With Conditional Position Encoding for Hyperspectral Image Classification

In Transformer-based hyperspectral image classification (HSIC), predefined positional encodings (PEs) are crucial for capturing the order of each input token. However, their typical representation as fixed-dimensional learnable vectors makes it challenging to adapt to variable-length input sequences...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5
Hauptverfasser: Ahmad, Muhammad, Usama, Muhammad, Khan, Adil Mehmood, Distefano, Salvatore, Altuwaijri, Hamad Ahmed, Mazzara, Manuel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In Transformer-based hyperspectral image classification (HSIC), predefined positional encodings (PEs) are crucial for capturing the order of each input token. However, their typical representation as fixed-dimensional learnable vectors makes it challenging to adapt to variable-length input sequences, thereby limiting the broader application of Transformers for HSIC. To address this issue, this study introduces an implicit conditional PEs (CPEs) scheme in a Transformer for HSIC, conditioned on the input token's local neighborhood. The proposed spatial-spectral Transformer (SSFormer) integrates spatial-spectral information and enhances classification performance by incorporating a CPE mechanism, thereby increasing the Transformer layers' capacity to preserve contextual relationships within the HSI data. Moreover, SSFormer ensembles the cross attention between patches and proposed learnable embeddings. This enables the model to capture global and local features simultaneously while addressing the constraint of limited training samples in a computationally efficient manner. Extensive experiments on publicly available HSI benchmarking datasets were conducted to validate the effectiveness of the proposed SSFormer model. The results demonstrated remarkable performance, achieving the classification accuracies of 97.7% on the Indian Pines dataset and 96.08% on the University of Houston dataset.
ISSN:1545-598X
1558-0571
DOI:10.1109/LGRS.2024.3431188