Topic scene graphs for image captioning

When describing an image, people can rapidly extract the topic from the image and find the main object, generating sentences that match the main idea of the image. However, most of the scene graph generation methods do not emphasise the importance of the topic of the image. Consequently, the caption...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IET Computer Vision 2022-06, Vol.16 (4), p.364-375
Hauptverfasser: Zhang, Min, Chen, Jingxiang, Li, Pengfei, Jiang, Ming, Zhou, Zhe
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When describing an image, people can rapidly extract the topic from the image and find the main object, generating sentences that match the main idea of the image. However, most of the scene graph generation methods do not emphasise the importance of the topic of the image. Consequently, the captions generated by the scene graph‐based image captioning models cannot reflect the topic in the image then expressing the central idea of the image. In this paper, we propose a method for image captioning based on topic scene graphs (TSG). Firstly, we propose the structure of topic scene graphs that express images' topics and the relationships between objects. Then, combined with the topic scene graph, we utilise the salient object detection to generate the topic scene graph highlighting the salient objects of the image. Note that our framework is agnostic to any scene graph‐based image captioning model and thus can be widely applied in the community which seeks salient object predictions. We compare the performance of our topic scene graph with the state‐of‐the‐art scene graph generation models and mainstream image captioning models on MSCOCO and Visual Genome datasets, both achieving better performance.
ISSN:1751-9632
1751-9640
DOI:10.1049/cvi2.12093