Towards Automatic Satellite Images Captions Generation Using Large Language Models
Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain i...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic image captioning is a promising technique for conveying visual
information using natural language. It can benefit various tasks in satellite
remote sensing, such as environmental monitoring, resource management, disaster
management, etc. However, one of the main challenges in this domain is the lack
of large-scale image-caption datasets, as they require a lot of human expertise
and effort to create. Recent research on large language models (LLMs) has
demonstrated their impressive performance in natural language understanding and
generation tasks. Nonetheless, most of them cannot handle images (GPT-3.5,
Falcon, Claude, etc.), while conventional captioning models pre-trained on
general ground-view images often fail to produce detailed and accurate captions
for aerial images (BLIP, GIT, CM3, CM3Leon, etc.). To address this problem, we
propose a novel approach: Automatic Remote Sensing Image Captioning (ARSIC) to
automatically collect captions for remote sensing images by guiding LLMs to
describe their object annotations. We also present a benchmark model that
adapts the pre-trained generative image2text model (GIT) to generate
high-quality captions for remote-sensing images. Our evaluation demonstrates
the effectiveness of our approach for collecting captions for remote sensing
images. |
---|---|
DOI: | 10.48550/arxiv.2310.11392 |