Token compression and bidirectional asymmetric matching multi-modal query image retrieval method

The invention provides a multi-modal query image retrieval method based on token compression and bidirectional asymmetric matching. The method comprises the steps that S1, an input image is partitioned and coded respectively, and an input text is converted into a token sequence through word embeddin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN BAITAO, CAI YUHANG, KE XIAO
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a multi-modal query image retrieval method based on token compression and bidirectional asymmetric matching. The method comprises the steps that S1, an input image is partitioned and coded respectively, and an input text is converted into a token sequence through word embedding; performing token compression and coding on the serialized data; s2, adding an additional fusion token to the obtained image modal and text modal token sequence of the fusion context, and carrying out token compression and coding again; and S3, performing forward accurate matching and reverse fuzzy matching on the single-mode and fusion-mode feature representations obtained in the step S2, and guiding a neural network learning process by using a matching result. S4, training the neural network, and reserving the best model weight for calculating the feature representation of the test set data to realize combined query image retrieval; according to the method, the multi-modal query semantics can be fully fused, an