A joint entity and relation extraction framework for handling negative samples problems in named entity recognition

Scientific articles and reports contain various domain-specific knowledge in the form of entities and relations between them. In recent years, such knowledge including overlapping entities are extracted mainly by span-based joint extraction methods, aiming at collecting key semantic information abou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied soft computing 2025-01, Vol.169, p.112570, Article 112570
Hauptverfasser:	Zhang, Hongbin, Lin, Guangyu, Chen, Kezhou, Lin, Nankai, Cheng, Lianglun, Yang, Aimin
Format:	Artikel
Sprache:	eng
Schlagworte:	Entity recognition Multi-task learning Negative samples Relation extraction Science IE Self-paced learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Scientific articles and reports contain various domain-specific knowledge in the form of entities and relations between them. In recent years, such knowledge including overlapping entities are extracted mainly by span-based joint extraction methods, aiming at collecting key semantic information about text topics and assisting researchers to understand the texts. Existing span-based methods of joint extraction have mainly focused on acquiring more comprehensive span embeddings in entity classification, but the data imbalance and hard negative samples problems have not been fully explored, leading to incorrect entity and relation classification in knowledge extraction. To this end, we propose a joint entity and relation extraction framework (JEREF) to better learn negative and positive samples in entity classification. Specifically, JEREF not only provides a binary boundary predictor to learn positive sample boundaries but also supplies a learning strategy consisting of self-paced learning and span-level contrastive learning to balance the data distribution and distinguish hard negative samples. Our framework is evaluated on SciERC and ADE datasets strictly, outperforming other state-of-the-art methods while achieving 42.15% and 83.24% micro − F1 scores on joint extraction, respectively. [Display omitted] •We propose a binary boundary predictor to learn entity boundaries independently.•We utilize self-paced learning to learn negative samples from easy to hard.•We learn span embeddings of samples better by span-level contrastive learning.•We evaluate JEREF across two datasets, which demonstrate the effectiveness of JEREF.
ISSN:	1568-4946
DOI:	10.1016/j.asoc.2024.112570