Eras: Improving the quality control in the annotation process for Natural Language Processing tasks
The increasing amount of valuable, unstructured textual information poses a major challenge to extract value from those texts. We need to use NLP (Natural Language Processing) techniques, most of which rely on manually annotating a large corpus of text for its development and evaluation. Creating a...
Gespeichert in:
Veröffentlicht in: | Information systems (Oxford) 2020-11, Vol.93, p.101553, Article 101553 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The increasing amount of valuable, unstructured textual information poses a major challenge to extract value from those texts. We need to use NLP (Natural Language Processing) techniques, most of which rely on manually annotating a large corpus of text for its development and evaluation. Creating a large annotated corpus is laborious and requires suitable computational support. There are many annotation tools available, but their main weaknesses are the absence of data management features for quality control and the need for a commercial license. As the quality of the data used to train an NLP model directly affects the quality of the results, the quality control of the annotations is essential. In this paper, we introduce ERAS, a novel web-based text annotation tool developed to facilitate and manage the process of text annotation. ERAS includes not only the key features of current mainstream annotation systems but also other features necessary to improve the curation process, such as the inter-annotator agreement, self-agreement and annotation log visualization, for annotation quality control. ERAS also implements a series of features to improve the customization of the user’s annotation workflow, such as: random document selection, re-annotation stages, and warm-up annotations. We conducted two empirical studies to evaluate the tool’s support to text annotation, and the results suggest that the tool not only meets the basic needs of the annotation task but also has some important advantages over the other tools evaluated in the studies. ERAS is freely available at https://github.com/grosmanjs/eras.
•We propose ERAS: A new text annotation/curation tool for NLP tasks.•The ERAS is an ontology-based annotation tool for entities and relations labeling.•To improve the quality control all the actions made by the annotators are logged.•In our studies, the ERAS improved upon several aspects when compared to other tools. |
---|---|
ISSN: | 0306-4379 1873-6076 |
DOI: | 10.1016/j.is.2020.101553 |