Towards Automatic Satellite Images Captions Generation Using Large Language Models

Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	He, Yingxu, Sun, Qiqi
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	He, Yingxu Sun, Qiqi
description	Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain is the lack of large-scale image-caption datasets, as they require a lot of human expertise and effort to create. Recent research on large language models (LLMs) has demonstrated their impressive performance in natural language understanding and generation tasks. Nonetheless, most of them cannot handle images (GPT-3.5, Falcon, Claude, etc.), while conventional captioning models pre-trained on general ground-view images often fail to produce detailed and accurate captions for aerial images (BLIP, GIT, CM3, CM3Leon, etc.). To address this problem, we propose a novel approach: Automatic Remote Sensing Image Captioning (ARSIC) to automatically collect captions for remote sensing images by guiding LLMs to describe their object annotations. We also present a benchmark model that adapts the pre-trained generative image2text model (GIT) to generate high-quality captions for remote-sensing images. Our evaluation demonstrates the effectiveness of our approach for collecting captions for remote sensing images.
doi_str_mv	10.48550/arxiv.2310.11392
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_11392</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_11392</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-c8f707b679f329e9e697fc052161840eef0fd2f397a72c143da2596bf537b3ce3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIofsR0vqwhKpSAkCOvoxrmOLOVR2SmPvyctbGZGo9FIh5A7zrZ5oRR7gPgdPrdCrgXn0opr8lbPXxC7RHenZR5hCY6-w4LDEBakhxF6TLSE4xLmKdE9ThjhnOlHClNPK4g9rjr1p3VJX-YOh3RDrjwMCW__fUPqp8e6fM6q1_2h3FUZaCMyV3jDTKuN9VJYtKit8Y4pwTUvcobome-El9aAEY7nsgOhrG69kqaVDuWG3P_dXqCaYwwjxJ_mDNdc4OQv6DhKIA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Towards Automatic Satellite Images Captions Generation Using Large Language Models</title><source>arXiv.org</source><creator>He, Yingxu ; Sun, Qiqi</creator><creatorcontrib>He, Yingxu ; Sun, Qiqi</creatorcontrib><description>Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain is the lack of large-scale image-caption datasets, as they require a lot of human expertise and effort to create. Recent research on large language models (LLMs) has demonstrated their impressive performance in natural language understanding and generation tasks. Nonetheless, most of them cannot handle images (GPT-3.5, Falcon, Claude, etc.), while conventional captioning models pre-trained on general ground-view images often fail to produce detailed and accurate captions for aerial images (BLIP, GIT, CM3, CM3Leon, etc.). To address this problem, we propose a novel approach: Automatic Remote Sensing Image Captioning (ARSIC) to automatically collect captions for remote sensing images by guiding LLMs to describe their object annotations. We also present a benchmark model that adapts the pre-trained generative image2text model (GIT) to generate high-quality captions for remote-sensing images. Our evaluation demonstrates the effectiveness of our approach for collecting captions for remote sensing images.</description><identifier>DOI: 10.48550/arxiv.2310.11392</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.11392$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.11392$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Yingxu</creatorcontrib><creatorcontrib>Sun, Qiqi</creatorcontrib><title>Towards Automatic Satellite Images Captions Generation Using Large Language Models</title><description>Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain is the lack of large-scale image-caption datasets, as they require a lot of human expertise and effort to create. Recent research on large language models (LLMs) has demonstrated their impressive performance in natural language understanding and generation tasks. Nonetheless, most of them cannot handle images (GPT-3.5, Falcon, Claude, etc.), while conventional captioning models pre-trained on general ground-view images often fail to produce detailed and accurate captions for aerial images (BLIP, GIT, CM3, CM3Leon, etc.). To address this problem, we propose a novel approach: Automatic Remote Sensing Image Captioning (ARSIC) to automatically collect captions for remote sensing images by guiding LLMs to describe their object annotations. We also present a benchmark model that adapts the pre-trained generative image2text model (GIT) to generate high-quality captions for remote-sensing images. Our evaluation demonstrates the effectiveness of our approach for collecting captions for remote sensing images.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIofsR0vqwhKpSAkCOvoxrmOLOVR2SmPvyctbGZGo9FIh5A7zrZ5oRR7gPgdPrdCrgXn0opr8lbPXxC7RHenZR5hCY6-w4LDEBakhxF6TLSE4xLmKdE9ThjhnOlHClNPK4g9rjr1p3VJX-YOh3RDrjwMCW__fUPqp8e6fM6q1_2h3FUZaCMyV3jDTKuN9VJYtKit8Y4pwTUvcobome-El9aAEY7nsgOhrG69kqaVDuWG3P_dXqCaYwwjxJ_mDNdc4OQv6DhKIA</recordid><startdate>20231017</startdate><enddate>20231017</enddate><creator>He, Yingxu</creator><creator>Sun, Qiqi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231017</creationdate><title>Towards Automatic Satellite Images Captions Generation Using Large Language Models</title><author>He, Yingxu ; Sun, Qiqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-c8f707b679f329e9e697fc052161840eef0fd2f397a72c143da2596bf537b3ce3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Yingxu</creatorcontrib><creatorcontrib>Sun, Qiqi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>He, Yingxu</au><au>Sun, Qiqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards Automatic Satellite Images Captions Generation Using Large Language Models</atitle><date>2023-10-17</date><risdate>2023</risdate><abstract>Automatic image captioning is a promising technique for conveying visual information using natural language. It can benefit various tasks in satellite remote sensing, such as environmental monitoring, resource management, disaster management, etc. However, one of the main challenges in this domain is the lack of large-scale image-caption datasets, as they require a lot of human expertise and effort to create. Recent research on large language models (LLMs) has demonstrated their impressive performance in natural language understanding and generation tasks. Nonetheless, most of them cannot handle images (GPT-3.5, Falcon, Claude, etc.), while conventional captioning models pre-trained on general ground-view images often fail to produce detailed and accurate captions for aerial images (BLIP, GIT, CM3, CM3Leon, etc.). To address this problem, we propose a novel approach: Automatic Remote Sensing Image Captioning (ARSIC) to automatically collect captions for remote sensing images by guiding LLMs to describe their object annotations. We also present a benchmark model that adapts the pre-trained generative image2text model (GIT) to generate high-quality captions for remote-sensing images. Our evaluation demonstrates the effectiveness of our approach for collecting captions for remote sensing images.</abstract><doi>10.48550/arxiv.2310.11392</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.11392
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_11392
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition
title	Towards Automatic Satellite Images Captions Generation Using Large Language Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T18%3A52%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20Automatic%20Satellite%20Images%20Captions%20Generation%20Using%20Large%20Language%20Models&rft.au=He,%20Yingxu&rft.date=2023-10-17&rft_id=info:doi/10.48550/arxiv.2310.11392&rft_dat=%3Carxiv_GOX%3E2310_11392%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true