Exploiting Parallel News Streams for Unsupervised Event Extraction
Most approaches to , the task of extracting ground facts from natural language text, are based on machine learning and thus starved by scarce training data. Manual annotation is too expensive to scale to a comprehensive set of relations. Distant supervision, which automatically creates training data...
Gespeichert in:
Veröffentlicht in: | Transactions of the Association for Computational Linguistics 2021-03, Vol.3, p.117-129 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Most approaches to
, the task of
extracting ground facts from natural language text, are based on machine
learning and thus starved by scarce training data. Manual annotation is too
expensive to scale to a comprehensive set of relations. Distant supervision,
which automatically creates training data, only works with relations that
already populate a knowledge base (KB). Unfortunately, KBs such as FreeBase
rarely cover event relations (
). Thus, the problem of extracting a wide range of events
— e.g., from news streams — is an important, open challenge.
This paper introduces N
S
-RE, a novel, unsupervised
algorithm that discovers event relations and then learns to extract them.
N
S
-RE uses a novel probabilistic graphical model to
cluster sentences describing similar events from parallel news streams. These
clusters then comprise training data for the extractor. Our evaluation shows
that N
S
-RE generates high quality training sentences
and learns extractors that perform much better than rival approaches, more than
doubling the area under a precision-recall curve compared to Universal
Schemas. |
---|---|
ISSN: | 2307-387X 2307-387X |
DOI: | 10.1162/tacl_a_00127 |