Utilising SkyScript for Open-Vocabulary Categorization, Extraction, and Captioning to Enhance Multi-Modal Tasks in Remote Sensing

The SkyScript dataset was developed by integrating large-scale remote sensing images from Google Earth Engine with geo-tagged semantic data from OpenStreetMap. This open-access dataset, consisting of 2.6 million image-text pairs covering 29,000 unique tags, facilitates various remote sensing tasks s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Remote sensing in earth systems sciences (Online) 2024-09, Vol.7 (3), p.149-158
Hauptverfasser: Nagaraj, Saranya, Sivakumar, Shanmuga Priya, Annabel, Lawrence Sherly Puspha, Joshi, Vilas Ramrao, Patil, Mithun Baswaraj, Patil, Vishal Ratansing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The SkyScript dataset was developed by integrating large-scale remote sensing images from Google Earth Engine with geo-tagged semantic data from OpenStreetMap. This open-access dataset, consisting of 2.6 million image-text pairs covering 29,000 unique tags, facilitates various remote sensing tasks such as cross-modal retrieval, image captioning, and classification. The dataset ensures global representation and semantic diversity, although it exhibits a higher concentration of high-resolution images from the USA and Europe due to licencing constraints. The images, sourced from multiple collections with varying ground sampling distances, are paired with captions generated using a combination of rule-based methods and logistic regression models for tag classification. Experiments demonstrate that models pre-trained on SkyScript outperform those trained on other datasets in zero-shot classification and fine-grained attribute recognition, highlighting its potential for advancing vision-language models in remote sensing applications. Future improvements could involve enhancing geographic coverage and refining caption quality using advanced language models.
ISSN:2520-8195
2520-8209
DOI:10.1007/s41976-024-00113-3