Attention-Guided Image Captioning through Word Information

Image captioning generates written descriptions of an image. In recent image captioning research, attention regions seldom cover all objects, and generated captions may lack the details of objects and may remain far from reality. In this paper, we propose a word guided attention (WGA) method for ima...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sensors (Basel, Switzerland) Switzerland), 2021-11, Vol.21 (23), p.7982
Hauptverfasser: Tang, Ziwei, Yi, Yaohua, Sheng, Hao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Image captioning generates written descriptions of an image. In recent image captioning research, attention regions seldom cover all objects, and generated captions may lack the details of objects and may remain far from reality. In this paper, we propose a word guided attention (WGA) method for image captioning. First, WGA extracts word information using the embedded word and memory cell by applying transformation and multiplication. Then, WGA applies word information to the attention results and obtains the attended feature vectors via elementwise multiplication. Finally, we apply WGA with the words from different time steps to obtain previous word guided attention (PW) and current word attention (CW) in the decoder. Experiments on the MSCOCO dataset show that our proposed WGA can achieve competitive performance against state-of-the-art methods, with PW results of a 39.1 Bilingual Evaluation Understudy score (BLEU-4) and a 127.6 Consensus-Based Image Description Evaluation score (CIDEr-D); and CW results of a 39.1 BLEU-4 score and a 127.2 CIDER-D score on a Karpathy test split.
ISSN:1424-8220
1424-8220
DOI:10.3390/s21237982