Local Window Attention Transformer for Polarimetric SAR Image Classification

Convolutional neural networks (CNNs) have recently found great attention in image classification since deep CNNs have exhibited excellent performance in computer vision. Owing to their immense success, of late, scientists are exploring the functionality of transformers in Earth observation applicati...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE geoscience and remote sensing letters 2023-01, Vol.20, p.1-1
Hauptverfasser:	Jamali, Ali, Roy, Swalpa Kumar, Bhattacharya, Avik, Ghamisi, Pedram
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial neural networks attention mechanism Classification Computer vision Convolutional neural networks convolutional neural networks (CNN) Data mining Feature extraction Image classification Indexes local window attention (LWA) Neural networks Polarimetry PolSAR image classification Radar data Remote sensing SAR (radar) Synthetic aperture radar Three-dimensional displays Transformers visual transformers Windows
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Convolutional neural networks (CNNs) have recently found great attention in image classification since deep CNNs have exhibited excellent performance in computer vision. Owing to their immense success, of late, scientists are exploring the functionality of transformers in Earth observation applications. Nevertheless, the primary issue with transformers is that they demand significantly more training data than CNN classifiers. Thus, the use of these transformers in remote sensing is considered challenging, notably in utilizing polarimetric SAR (PolSAR) data, due to the insufficient number of existing labeled data. In this letter, we develop and propose a vision transformer-based framework that utilizes 3D and 2D CNNs as feature extractors and, in addition, local window attention for the effective classification of PolSAR data. Extensive experimental results demonstrated that the developed model PolSARFormer obtained better classification accuracy than the state-of-the-art vision Swin Transformer and FNet algorithms. The PolSARFormer outperformed the Swin Transformer and FNet by the margin of 5.86% and 17.63%, in terms of average accuracy in the San Francisco data benchmark. Moreover, the results over the Flevoland dataset illustrated that the PolSARFormer exceeds several other algorithms, including the ResNet (97.49%), Swin Transformer (96.54%), FNet (95.28%), 2D CNN (94.57%), and AlexNet (91.83%), with a kappa index of 99.30%. The code will be made available publicly at https://github.com/aj1365/PolSARFormer.
ISSN:	1545-598X 1558-0571
DOI:	10.1109/LGRS.2023.3239263