RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning
Aerial Remote Sensing (ARS) vision tasks pose significant challenges due to the unique characteristics of their viewing angles. Existing research has primarily focused on algorithms for specific tasks, which have limited applicability in a broad range of ARS vision applications. This paper proposes...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Aerial Remote Sensing (ARS) vision tasks pose significant challenges due to
the unique characteristics of their viewing angles. Existing research has
primarily focused on algorithms for specific tasks, which have limited
applicability in a broad range of ARS vision applications. This paper proposes
the RingMo-Aerial model, aiming to fill the gap in foundation model research in
the field of ARS vision. By introducing the Frequency-Enhanced Multi-Head
Self-Attention (FE-MSA) mechanism and an affine transformation-based
contrastive learning pre-training method, the model's detection capability for
small targets is enhanced and optimized for the tilted viewing angles
characteristic of ARS. Furthermore, the ARS-Adapter, an efficient parameter
fine-tuning method, is proposed to improve the model's adaptability and
effectiveness in various ARS vision tasks. Experimental results demonstrate
that RingMo-Aerial achieves SOTA performance on multiple downstream tasks. This
indicates the practicality and effectiveness of RingMo-Aerial in enhancing the
performance of ARS vision tasks. |
---|---|
DOI: | 10.48550/arxiv.2409.13366 |