Image captioning with weakly-supervised attention penalty

Techniques for training a machine-learning (ML) model for captioning images are disclosed. A plurality of feature vectors and a plurality of visual attention maps are generated by a visual model of the ML model based on an input image. Each of the plurality of feature vectors correspond to different...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	YU, Yen-Yun, MOGHTADERI, Azadeh, EBRAHIMPOUR, Mohammad K, LI, Jiayun
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Techniques for training a machine-learning (ML) model for captioning images are disclosed. A plurality of feature vectors and a plurality of visual attention maps are generated by a visual model of the ML model based on an input image. Each of the plurality of feature vectors correspond to different regions of the input image. A plurality of caption attention maps are generated by an attention model of the ML model based on the plurality of feature vectors. An attention penalty is calculcated based on a comparison between the caption attention maps and the visual attention maps. A loss function is calculcated based on the attention penalty. One or both of the visual model and the attention model are trained using the loss function.