Faster Zero-shot Multi-modal Entity Linking via Visual-Linguistic Representation
Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks, i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new (ZEMEL) task, the format is similar to multi-modal entity linking, but multi-modal mentions are linked to unseen e...
Gespeichert in:
Veröffentlicht in: | Data intelligence 2022-07, Vol.4 (3), p.493-508 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-modal entity linking plays a crucial role in a wide range of
knowledge-based modal-fusion tasks, i.e., multi-modal retrieval and multi-modal
event extraction. We introduce the new
(ZEMEL) task, the format is similar to multi-modal entity
linking, but multi-modal mentions are linked to unseen entities in the knowledge
graph, and the purpose of zero-shot setting is to realize robust linking in
highly specialized domains. Simultaneously, the inference efficiency of existing
models is low when there are many candidate entities. On this account, we
propose a novel model that leverages visuallinguistic representation through the
co-attentional mechanism to deal with the ZEMEL task, considering the trade-off
between performance and efficiency of the model. We also build a dataset named
ZEMELD for the new task, which contains multi-modal data resources collected
from Wikipedia, and we annotate the entities as ground truth. Extensive
experimental results on the dataset show that our proposed model is effective as
it significantly improves the precision from 68.93% to 82.62%
comparing with baselines in the ZEMEL task. |
---|---|
ISSN: | 2641-435X 2641-435X |
DOI: | 10.1162/dint_a_00146 |