Exploiting Parallel News Streams for Unsupervised Event Extraction

Most approaches to , the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Transactions of the Association for Computational Linguistics 2021-03, Vol.3, p.117-129
Hauptverfasser: Zhang, Congle, Soderland, Stephen, Weld, Daniel S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most approaches to , the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data, only works with relations that already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase rarely cover event relations ( ). Thus, the problem of extracting a wide range of events — e.g., from news streams — is an important, open challenge. This paper introduces N S -RE, a novel, unsupervised algorithm that discovers event relations and then learns to extract them. N S -RE uses a novel probabilistic graphical model to cluster sentences describing similar events from parallel news streams. These clusters then comprise training data for the extractor. Our evaluation shows that N S -RE generates high quality training sentences and learns extractors that perform much better than rival approaches, more than doubling the area under a precision-recall curve compared to Universal Schemas.
ISSN:2307-387X
2307-387X
DOI:10.1162/tacl_a_00127