On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions

[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2015-08, Vol.56, p.318-332
Hauptverfasser: Oronoz, Maite, Gojenola, Koldo, Pérez, Alicia, de Ilarraza, Arantza Díaz, Casillas, Arantza
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2015.06.016