Utilising SkyScript for Open-Vocabulary Categorization, Extraction, and Captioning to Enhance Multi-Modal Tasks in Remote Sensing
The SkyScript dataset was developed by integrating large-scale remote sensing images from Google Earth Engine with geo-tagged semantic data from OpenStreetMap. This open-access dataset, consisting of 2.6 million image-text pairs covering 29,000 unique tags, facilitates various remote sensing tasks s...
Gespeichert in:
Veröffentlicht in: | Remote sensing in earth systems sciences (Online) 2024-09, Vol.7 (3), p.149-158 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The SkyScript dataset was developed by integrating large-scale remote sensing images from Google Earth Engine with geo-tagged semantic data from OpenStreetMap. This open-access dataset, consisting of 2.6 million image-text pairs covering 29,000 unique tags, facilitates various remote sensing tasks such as cross-modal retrieval, image captioning, and classification. The dataset ensures global representation and semantic diversity, although it exhibits a higher concentration of high-resolution images from the USA and Europe due to licencing constraints. The images, sourced from multiple collections with varying ground sampling distances, are paired with captions generated using a combination of rule-based methods and logistic regression models for tag classification. Experiments demonstrate that models pre-trained on SkyScript outperform those trained on other datasets in zero-shot classification and fine-grained attribute recognition, highlighting its potential for advancing vision-language models in remote sensing applications. Future improvements could involve enhancing geographic coverage and refining caption quality using advanced language models. |
---|---|
ISSN: | 2520-8195 2520-8209 |
DOI: | 10.1007/s41976-024-00113-3 |