Introducing high correlation and high quality instances for few-shot entity linking

Entity linking, the process of connecting textual mentions in documents to canonical entities within a knowledge base, plays an integral role in a myriad of natural language processing tasks. A significant challenge prevalent within the field is the scarcity of resources, particularly for multiple s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2025-01, Vol.181, p.106783, Article 106783
Hauptverfasser: Sui, Xuhui, Zhang, Ying, Song, Kehui, Zhou, Baohang, Yuan, Xiaojie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Entity linking, the process of connecting textual mentions in documents to canonical entities within a knowledge base, plays an integral role in a myriad of natural language processing tasks. A significant challenge prevalent within the field is the scarcity of resources, particularly for multiple specialized domains, which accentuates the importance of few-shot entity linking in real-world scenarios. Previous works address the problem of lacking in-domain labeled data by generating synthetic data. However, we argue that the synthetic data is frequently far from high-quality, such low-quality instances will introduce noise and diminish the ability of entity linking models to comprehend the semantic consistency between mentions and entities. In this paper, we propose a H2FEL framework to introduce high correlation and high quality instances for few-shot entity linking. We argue that there are rich high-quality labeled data in general domains and some of them are highly correlated to the target domain. Thus, we first design an adversarial instance extraction module to extract such high-correlation instances without depending on additional manually annotated data. To further mitigate the negative effects brought by low-correlation instances, we train our entity linking model via a variant of curriculum learning. Experimental results on the few-shot entity linking dataset demonstrate the effectiveness of our proposed H2FEL framework and it achieves state-of-the-art performance. •First prioritize the correlation of instances to the target domain.•Adversarial extraction for high correlation instances from high quality data.•Curriculum learning variant to mitigate low-correlation instances’ negative effect.•The micro accuracy on the few-shot entity linking dataset improved 19.47%.
ISSN:0893-6080
1879-2782
1879-2782
DOI:10.1016/j.neunet.2024.106783