Cross-modal retrieval method and retrieval system

The invention provides a cross-modal retrieval method and system, and the method comprises the steps: carrying out the coding of features through employing a CLIP pre-training model, and obtaining the original modal features of an original image and a text; carrying out attention alignment processin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	QIANG BAOHUA, XI GUANGYONG, CHEN RUIDONG, SUN PINGPING, YANG XIANYI
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention provides a cross-modal retrieval method and system, and the method comprises the steps: carrying out the coding of features through employing a CLIP pre-training model, and obtaining the original modal features of an original image and a text; carrying out attention alignment processing on the original modal features to obtain modal alignment data so as to realize semantic correlation between original modals; keeping the modal invariance of the modal data formed in the previous step through a weight-shared multi-layer perceptron; and finally obtained feature data are distributed on a normalized hypersphere by using an Arc4cmr loss function to carry out category boundary constraint. According to the cross-modal retrieval method disclosed by the invention, public representations of paired images and texts are as close as possible, and intra-class tightness and inter-class difference are enhanced at the same time. 本发明提供了一种跨模态检索方法以及检索系统，所述检索方法包括：采用CLIP预训练模型对特征进行编码，获得包括原始图像以及文本的原始模态特征；将所述原始模态特征进行注意力对