Cross-modal retrieval method and retrieval system

The invention provides a cross-modal retrieval method and system, and the method comprises the steps: carrying out the coding of features through employing a CLIP pre-training model, and obtaining the original modal features of an original image and a text; carrying out attention alignment processin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: QIANG BAOHUA, XI GUANGYONG, CHEN RUIDONG, SUN PINGPING, YANG XIANYI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a cross-modal retrieval method and system, and the method comprises the steps: carrying out the coding of features through employing a CLIP pre-training model, and obtaining the original modal features of an original image and a text; carrying out attention alignment processing on the original modal features to obtain modal alignment data so as to realize semantic correlation between original modals; keeping the modal invariance of the modal data formed in the previous step through a weight-shared multi-layer perceptron; and finally obtained feature data are distributed on a normalized hypersphere by using an Arc4cmr loss function to carry out category boundary constraint. According to the cross-modal retrieval method disclosed by the invention, public representations of paired images and texts are as close as possible, and intra-class tightness and inter-class difference are enhanced at the same time. 本发明提供了一种跨模态检索方法以及检索系统,所述检索方法包括:采用CLIP预训练模型对特征进行编码,获得包括原始图像以及文本的原始模态特征;将所述原始模态特征进行注意力对