Improving mention detection for Basque based on a deep error analysis

This paper presents the improvement process of a mention detector for Basque. The system is rule-based and takes into account the characteristics of mentions in Basque. A classification of error types is proposed based on the errors that occur during mention detection. A deep error analysis distingu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Natural language engineering 2017-05, Vol.23 (3), p.351-384
Hauptverfasser:	SORALUZE, ANDER, ARREGI, OLATZ, ARREGI, XABIER, DÍAZ DE ILARRAZA, ARANTZA
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial intelligence Basque language Computational linguistics Error analysis Error detection Language Linguistics Matching Natural language processing Semantics Sensors Standard data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents the improvement process of a mention detector for Basque. The system is rule-based and takes into account the characteristics of mentions in Basque. A classification of error types is proposed based on the errors that occur during mention detection. A deep error analysis distinguishing error types and causes is presented and improvements are proposed. At the final stage, the system obtains an F-measure of 74.57% under the Exact Matching protocol and of 80.57% under Lenient Matching. We also show the performance of the mention detector with gold standard data as input, in order to omit errors caused by the previous stages of linguistic processing. In this scenario, we obtain an F-measure of 85.89% with Strict Matching and of 89.06% with Lenient Matching, i.e., a difference of 11.32 and 8.49 percentage points, respectively. Finally, how improvements in mention detection affect coreference resolution is analysed.
ISSN:	1351-3249 1469-8110
DOI:	10.1017/S1351324916000206