A corpus to support eHealth Knowledge Discovery technologies

[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2019-06, Vol.94, p.103172-103172, Article 103172
Hauptverfasser:	Piad-Morffis, Alejandro, Gutiérrez, Yoan, Muñoz, Rafael
Format:	Artikel
Sprache:	eng
Schlagworte:	Corpus eHealth Knowledge discovery Spanish Subject-Verb-Object
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs. This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.
ISSN:	1532-0464 1532-0480
DOI:	10.1016/j.jbi.2019.103172