IMAGE CAPTIONING USING TRANSFORMER WITH IMAGE FEATURE EXTRACTION BY XCEPTION AND INCEPTION-V3

Image captioning is a task in image processing that involves creating text descriptions that can describe the image content. The formation of the image captioning system model is influenced by image interpretation related to the given image caption. Image interpretation is influenced by the feature...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Jurnal Ilmiah Kursor: Menuju Solusi Teknologi Informasi 2024-07, Vol.12 (3), p.135-146
Hauptverfasser:	Pardede, Jasman, Fandi
Format:	Artikel
Sprache:	eng
Schlagworte:	batch_size image captioning Inception-V3 Transformer Xception
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Image captioning is a task in image processing that involves creating text descriptions that can describe the image content. The formation of the image captioning system model is influenced by image interpretation related to the given image caption. Image interpretation is influenced by the feature extraction used. This research proposes feature extraction with Xception and Inception-V3 by generating an image captioning model using Transformer. Model performance is measured based on BLUE and METEOR values. Based on the results of research conducted on the Flickr8k Dataset, it shows that the best model performance is using Xception feature extraction and batch_size = 256. The image captioning performance of Xception feature extraction for BLUE-1, BLUE-2, BLUE-3, BLUE-4, and METEOR when compared with Inception-V3 achieves increasing of 13.15%, 18.03%, 18.71%, 27.27%, and 15.43% respectively. The performance for Xception feature extraction with batch_size = 256 compared with batch_size = 128, increasing BLUE-1, BLUE-2, BLUE-3, BLUE-4, and METEOR namely 19.81%, 41.84%, 52.23%, 53.14%, and 31.56% respectively.
ISSN:	0216-0544 2301-6914
DOI:	10.21107/kursor.v12i3.376