Image-text retrieval method and system based on cross-modal semantic analysis

The invention relates to an image-text retrieval method and system based on cross-modal semantic analysis. The method comprises the steps of image representation, wherein a given image is understood, and feature codes of a salient region are generated; text representation: understanding a given text...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LI ZIXU, HU YUPENG, MOU YANSONG, WANG KUN, LI MING, TIAN YANG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention relates to an image-text retrieval method and system based on cross-modal semantic analysis. The method comprises the steps of image representation, wherein a given image is understood, and feature codes of a salient region are generated; text representation: understanding a given text query statement, and generating context-related discrete vocabulary codes; performing intra-modal feature fusion on the image and the text representation by using a self-attention mechanism; the method comprises the following steps: respectively calculating cosine similarity of an image-text pair by using a hash code and a quantized code generated by an aggregation feature, screening out a candidate set which ranks in the top through two rounds of sorting, introducing a cross-modal attention mechanism to calculate the candidate set to obtain a relatively accurate fine-grained matching score, and performing internal fine tuning on a ranking relationship by using similarity resorting to obtain a fine-grained matchin