Cross-modal retrieval method and retrieval system
The invention provides a cross-modal retrieval method and system, and the method comprises the steps: carrying out the coding of features through employing a CLIP pre-training model, and obtaining the original modal features of an original image and a text; carrying out attention alignment processin...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a cross-modal retrieval method and system, and the method comprises the steps: carrying out the coding of features through employing a CLIP pre-training model, and obtaining the original modal features of an original image and a text; carrying out attention alignment processing on the original modal features to obtain modal alignment data so as to realize semantic correlation between original modals; keeping the modal invariance of the modal data formed in the previous step through a weight-shared multi-layer perceptron; and finally obtained feature data are distributed on a normalized hypersphere by using an Arc4cmr loss function to carry out category boundary constraint. According to the cross-modal retrieval method disclosed by the invention, public representations of paired images and texts are as close as possible, and intra-class tightness and inter-class difference are enhanced at the same time.
本发明提供了一种跨模态检索方法以及检索系统,所述检索方法包括:采用CLIP预训练模型对特征进行编码,获得包括原始图像以及文本的原始模态特征;将所述原始模态特征进行注意力对 |
---|