A corpus to support eHealth Knowledge Discovery technologies
[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to...
Gespeichert in:
Veröffentlicht in: | Journal of biomedical informatics 2019-06, Vol.94, p.103172-103172, Article 103172 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | [Display omitted]
•A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs.
This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains. |
---|---|
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2019.103172 |