AGRR-2019: A Corpus for Gapping Resolution in Russian
This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automati...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper provides a comprehensive overview of the gapping dataset for
Russian that consists of 7.5k sentences with gapping (as well as 15k relevant
negative sentences) and comprises data from various genres: news, fiction,
social media and technical texts. The dataset was prepared for the Automatic
Gapping Resolution Shared Task for Russian (AGRR-2019) - a competition aimed at
stimulating the development of NLP tools and methods for processing of
ellipsis.
In this paper, we pay special attention to the gapping resolution methods
that were introduced within the shared task as well as an alternative test set
that illustrates that our corpus is a diverse and representative subset of
Russian language gapping sufficient for effective utilization of machine
learning techniques. |
---|---|
DOI: | 10.48550/arxiv.1906.04099 |