ParaShoot: A Hebrew Question Answering Dataset
NLP research in Hebrew has largely focused on morphology and syntax, where rich annotated datasets in the spirit of Universal Dependencies are available. Semantic datasets, however, are in short supply, hindering crucial advances in the development of NLP technology in Hebrew. In this work, we prese...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | NLP research in Hebrew has largely focused on morphology and syntax, where
rich annotated datasets in the spirit of Universal Dependencies are available.
Semantic datasets, however, are in short supply, hindering crucial advances in
the development of NLP technology in Hebrew. In this work, we present
ParaShoot, the first question answering dataset in modern Hebrew. The dataset
follows the format and crowdsourcing methodology of SQuAD, and contains
approximately 3000 annotated examples, similar to other question-answering
datasets in low-resource languages. We provide the first baseline results using
recently-released BERT-style models for Hebrew, showing that there is
significant room for improvement on this task. |
---|---|
DOI: | 10.48550/arxiv.2109.11314 |