Folk Games Image Captioning using Object Attention

The result of a deep learning-based image captioning system with encoder-decoder framework relies heavily on the image feature extraction technique and the caption-based model. The accuracy of the model is heavily influenced by the proposed attention mechanism. The inability to distinguish between t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Online) 2023-08, Vol.7 (4), p.758-766
Hauptverfasser: Akbar, Saiful, Sitohang, Benhard, Pardede, Jasman, Amal, Irfan, Yunastrian, Kurnianda, Ahmada, Marsa, Prameswari, Anindya
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The result of a deep learning-based image captioning system with encoder-decoder framework relies heavily on the image feature extraction technique and the caption-based model. The accuracy of the model is heavily influenced by the proposed attention mechanism. The inability to distinguish between the output of the attention model and the input expectation of the decoder can cause the decoder to give incorrect results. In this paper, we proposed an object-attention mechanism using object detection. Object detection outputs a bounding box and an object category label, which is then used as an image input into VGG16 for feature extraction and into a caption-based LSTM model. The experimental results showed that the system with object attention performed better than the system without object attention. BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDER scores for the image captioning system with object attention improved 12.48%, 17.39%, 24.06%, 36.37%, and 43.50% respectively compared to the system without object attention.  
ISSN:2580-0760
2580-0760
DOI:10.29207/resti.v7i4.4708