Faster and Better: A Lightweight Transformer Network for Remote Sensing Scene Classification

Remote sensing (RS) scene classification has received considerable attention due to its wide applications in the RS community. Many methods based on convolutional neural networks (CNNs) have been proposed to classify complex RS scenes, but they cannot fully capture the context in RS images because o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Remote sensing (Basel, Switzerland) Switzerland), 2023-07, Vol.15 (14), p.3645
Hauptverfasser:	Huang, Xinyan, Liu, Fang, Cui, Yuanhao, Chen, Puhua, Li, Lingling, Li, Pengfang
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Analysis Artificial neural networks Attentional bias Classification Complexity Computer applications Computing costs convolutional neural network Datasets Design Image retrieval Lightweight lightweight transformer Modules Neural networks Remote sensing scene classification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Remote sensing (RS) scene classification has received considerable attention due to its wide applications in the RS community. Many methods based on convolutional neural networks (CNNs) have been proposed to classify complex RS scenes, but they cannot fully capture the context in RS images because of the lack of long-range dependencies (the dependency relationship between two distant elements). Recently, some researchers fine-tuned the large pretrained vision transformer (ViT) on small RS datasets to extract long-range dependencies effectively in RS scenes. However, it usually takes more time to fine-tune the ViT on account of high computational complexity. The lack of good local feature representation in the ViT limits classification performance improvement. To this end, we propose a lightweight transformer network (LTNet) for RS scene classification. First, a multi-level group convolution (MLGC) module is presented. It enriches the diversity of local features and requires a lower computational cost by co-representing multi-level and multi-group features in a single module. Then, based on the MLGC module, a lightweight transformer block, LightFormer, was designed to capture global dependencies with fewer computing resources. Finally, the LTNet was built using the MLGC and LightFormer. The experiments of fine-tuning the LTNet on four RS scene classification datasets demonstrate that the proposed network achieves a competitive classification performance under less training time.
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs15143645