Relation Extraction: Hypernymy Discovery Using a Novel Pattern Learning Algorithm
This paper proposes a semi-supervised relation extraction methodology to extract hypernymy (Is-A) relations. We developed a pattern learning-based model based on a "most reliable pattern". After each iteration, the algorithm generates trusted instances of hypernym–hyponym pairs using only...
Gespeichert in:
Veröffentlicht in: | SN computer science 2023-11, Vol.4 (6), p.730, Article 730 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a semi-supervised relation extraction methodology to extract hypernymy (Is-A) relations. We developed a pattern learning-based model based on a "most reliable pattern". After each iteration, the algorithm generates trusted instances of hypernym–hyponym pairs using only a corpus of text and a set of seed instances as the input. Sentences are masked and extracted, and patterns are discovered and ranked. A pattern-matching algorithm generates pairs, and a scoring function appropriately filters pairs. The generated pairs are added to the initial seed set via a bootstrapping approach to facilitate further the iterative algorithm in generating a new trusted pair set. The work presented here is a semi-supervised approach, and to facilitate the experiments conducted, we are using two freely available public Wikipedia text corpus to extract hypernyms. We use Hearst patterns, an extended version of Hearst patterns (adding more patterns), and a dependency-based approach to form a base for comparison to our developed pattern learning approach. To evaluate the proposed algorithm, the hypernym–hyponym relations obtained are tested against five standard publicly available datasets, namely, BLESS, WBLESS, WEEDS, EVAL, and LEDS datasets as criteria for comparison. The results of the two Wikipedia text corpus and five evaluation datasets show that the pattern learning approach performs better than the three comparison base algorithms. The lack of heavy skewness in results across the two datasets also indicates that the algorithms implemented are independent of the corpus used and can be used on any large corpus. |
---|---|
ISSN: | 2661-8907 2662-995X 2661-8907 |
DOI: | 10.1007/s42979-023-02161-w |