RDMIF: Reverse dictionary model based on multi-modal information fusion

In reverse dictionary (RD) research, the goal is to construct the components of a target word vector through the parsing of human-comprehensible definitions. This process emulates the human lexical memory mechanism, where the introduction of visual information is considered an efficient memory strat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2025-02, Vol.619, p.129202, Article 129202
Hauptverfasser: Tian, Sicheng, Huang, Shaobin, Li, Rongsheng, Wei, Chi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In reverse dictionary (RD) research, the goal is to construct the components of a target word vector through the parsing of human-comprehensible definitions. This process emulates the human lexical memory mechanism, where the introduction of visual information is considered an efficient memory strategy. However, due to the scarcity of multimodal datasets, existing studies have largely failed to capitalize on visual information. To bridge this research gap, we have constructed a multimodal dataset and a reverse dictionary model based on multimodal (RDMIF) tailored for common objects. Dataset comprises words, images, definitions, and their examples. RDMIF employs an attention mechanism to selectively focus on key elements within textual and visual information and reveals underlying semantic connections through global intra-modal and inter-modal correlation analysis. The model computes cosine similarity between definitions and target words to evaluate explanatory power. The experimental results indicate that RDMIF improves rank value by 1.2 %, Top1 accuracy by 3.6 %, and Top5 accuracy by 5.2 % over baseline models. Our research offers new perspectives and methodologies for the multimodal RD domain. •RDMIF: A novel reverse dictionary model fusing multimodal info, enhancing word definitions with visual cues.•Multimodal dataset: a comprehensive multimodal dataset with 1000 + categories from ImageNet.•Performance Boost: RDMIF shows a 1.1% improvement in key metric rank value.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.129202