An Entity-Relation Joint Extraction Method Based on Two Independent Sub-Modules From Unstructured Text

Extracting entity, relation, and attribute information from unstructured text is crucial for constructing large-scale knowledge graphs (KG). Existing research approaches either focus on entity recognition before relation extraction or employ unified annotation. However, these methods overlook the in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023, Vol.11, p.122154-122163
Hauptverfasser: Liu, Su, Lyu, Wenqi, Ma, Xiao, Ge, Jike
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Extracting entity, relation, and attribute information from unstructured text is crucial for constructing large-scale knowledge graphs (KG). Existing research approaches either focus on entity recognition before relation extraction or employ unified annotation. However, these methods overlook the intrinsic relation between entity recognition and relation extraction, resulting in ineffective handling of triple overlap issues where multiple relations share the same entity in a sentence. To address these challenges, this paper proposes an entity-relation joint extraction model comprising two independent sub-modules: one for extracting the head entity and the other for extracting the tail entity and its corresponding relation. The model generates candidate entities and relations by enumerating token sequences in sentences, and then uses the two sub-modules to predict entities and relations. The predicted entities and relations are jointly decoded to obtain relational triples, avoiding error propagation and solving redundancy, entity overlap, and poor generalization. Extensive experiments demonstrate that our model achieves state-of-the-art performance on WebNLG, NYT, WebNLG*, and NYT* public benchmarks. It outperforms all baselines on the WebNLG* dataset, showing significant improvements in different types of triples: normal, SEO, and EPO by 3.8%, 2.9%, and 5.5%, respectively, compared to ETL-Span. For the NYT* dataset, our method improves by 5.7% in triples of Normal type, thereby confirming its effectiveness.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3328802