Image captioning with transformer and knowledge graph
•We add the KL divergence to distinguish the difference between incorrect predictions.•We leverage the knowledge graph to help the Transformer model generate captions.•Our methods can efficiently improve the performance of the Transformer model. The Transformer model has achieved very good results i...
Gespeichert in:
Veröffentlicht in: | Pattern recognition letters 2021-03, Vol.143, p.43-49 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We add the KL divergence to distinguish the difference between incorrect predictions.•We leverage the knowledge graph to help the Transformer model generate captions.•Our methods can efficiently improve the performance of the Transformer model.
The Transformer model has achieved very good results in machine translation tasks. In this paper, we adopt the Transformer model for the image captioning task. To promote the performance of image captioning, we improve the Transformer model from two aspects. First, we augment the maximum likelihood estimation (MLE) with an extra Kullback-Leibler (KL) divergence term to distinguish the difference between incorrect predictions. Second, we introduce a method to help the Transformer model generate captions by leveraging the knowledge graph. Experiments on benchmark datasets demonstrate the effectiveness of our method. |
---|---|
ISSN: | 0167-8655 1872-7344 |
DOI: | 10.1016/j.patrec.2020.12.020 |