Image-text retrieval method and system based on cross-modal semantic analysis

The invention relates to an image-text retrieval method and system based on cross-modal semantic analysis. The method comprises the steps of image representation, wherein a given image is understood, and feature codes of a salient region are generated; text representation: understanding a given text...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LI ZIXU, HU YUPENG, MOU YANSONG, WANG KUN, LI MING, TIAN YANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to an image-text retrieval method and system based on cross-modal semantic analysis. The method comprises the steps of image representation, wherein a given image is understood, and feature codes of a salient region are generated; text representation: understanding a given text query statement, and generating context-related discrete vocabulary codes; performing intra-modal feature fusion on the image and the text representation by using a self-attention mechanism; the method comprises the following steps: respectively calculating cosine similarity of an image-text pair by using a hash code and a quantized code generated by an aggregation feature, screening out a candidate set which ranks in the top through two rounds of sorting, introducing a cross-modal attention mechanism to calculate the candidate set to obtain a relatively accurate fine-grained matching score, and performing internal fine tuning on a ranking relationship by using similarity resorting to obtain a fine-grained matchin