Cross-modal retrieval method based on multi-granularity feature interaction

The invention discloses a cross-modal retrieval method based on multi-granularity feature interaction, which is used for realizing a mutual retrieval task between a video and a text. According to the method, the global visual semantic features are used for guiding the local visual semantic features...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: DONG JIANFENG, WANG YABING, CHEN SHUJIE, YANG TAO, ZHANG MINSONG, ZHENG QI, LIU BAOLONG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a cross-modal retrieval method based on multi-granularity feature interaction, which is used for realizing a mutual retrieval task between a video and a text. According to the method, the global visual semantic features are used for guiding the local visual semantic features to obtain the local information with the finer granularity, and then the global visual semantic features and the enhanced local visual semantic features are subjected to mutual learning fusion to obtain the features of the video level. Video level features and text features obtained through learning are mapped into the same embedding space, cross-modal matching is carried out in the embedding space, and therefore cross-modal retrieval between the text and the video is achieved. The cross-modal retrieval method based on the neural network achieves better balance in performance and complexity. 本发明公开了一种基于多粒度特征交互的跨模态检索方法,用于实现视频和文本之间的相互检索任务。本发明使用全局视觉语义特征指导局部视觉语义特征来获取更加细粒度的局部信息,随后让全局视觉语义特征和经过增强的局部视觉语义特征进行相互学习融合得到视频级别的特征。