Image-text retrieval method and system based on cross-modal semantic analysis
The invention relates to an image-text retrieval method and system based on cross-modal semantic analysis. The method comprises the steps of image representation, wherein a given image is understood, and feature codes of a salient region are generated; text representation: understanding a given text...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to an image-text retrieval method and system based on cross-modal semantic analysis. The method comprises the steps of image representation, wherein a given image is understood, and feature codes of a salient region are generated; text representation: understanding a given text query statement, and generating context-related discrete vocabulary codes; performing intra-modal feature fusion on the image and the text representation by using a self-attention mechanism; the method comprises the following steps: respectively calculating cosine similarity of an image-text pair by using a hash code and a quantized code generated by an aggregation feature, screening out a candidate set which ranks in the top through two rounds of sorting, introducing a cross-modal attention mechanism to calculate the candidate set to obtain a relatively accurate fine-grained matching score, and performing internal fine tuning on a ranking relationship by using similarity resorting to obtain a fine-grained matchin |
---|