Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models
This study focuses on identifying the most effective pre-trained model for land use classification in onboard satellite processing, emphasizing achieving high accuracy, computational efficiency, and robustness against noisy data conditions commonly encountered during satellite-based inference. Throu...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This study focuses on identifying the most effective pre-trained model for
land use classification in onboard satellite processing, emphasizing achieving
high accuracy, computational efficiency, and robustness against noisy data
conditions commonly encountered during satellite-based inference. Through
extensive experimentation, we compare the performance of traditional CNN-based,
ResNet-based, and various pre-trained vision Transformer models. Our findings
demonstrate that pre-trained Vision Transformer (ViT) models, particularly
MobileViTV2 and EfficientViT-M2, outperform models trained from scratch in
terms of accuracy and efficiency. These models achieve high performance with
reduced computational requirements and exhibit greater resilience during
inference under noisy conditions. While MobileViTV2 has excelled on clean
validation data, EfficientViT-M2 has proved more robust when handling noise,
making it the most suitable model for onboard satellite EO tasks. Our
experimental results demonstrate that EfficientViT-M2 is the optimal choice for
reliable and efficient RS-IC in satellite operations, achieving 98.76 % of
accuracy, precision, and recall. Precisely, EfficientViT-M2 delivers the
highest performance across all metrics, excels in training efficiency (1,000s)
and inference time (10s), and demonstrates greater robustness (overall
robustness score of 0.79). Consequently, EfficientViT-M2 consumes 63.93 % less
power than MobileViTV2 (79.23 W) and 73.26 % less power than SwinTransformer
(108.90 W). This highlights its significant advantage in energy efficiency. |
---|---|
DOI: | 10.48550/arxiv.2409.03901 |