A Text-Guided Generation and Refinement Model for Image Captioning

A high-quality image description requires not only the logic and fluency of language but also the richness and accuracy ofcontent. However, due to the semantic gap between vision and language, most existing image captioning approaches thatdirectly learn the cross-modal mapping from vision to languag...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2023, Vol.25, p.2966-2977
Hauptverfasser:	Wang, Depeng, Hu, Zhenzhen, Zhou, Yuanen, Hong, Richang, Wang, Meng
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Coders Cognition Decoding generating and refining decoder Generators image captioning Image quality Modules Refining Salience Semantics Sports Sports equipment text-guided Training Vision Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!