A corpus to support eHealth Knowledge Discovery technologies

[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2019-06, Vol.94, p.103172-103172, Article 103172
Hauptverfasser: Piad-Morffis, Alejandro, Gutiérrez, Yoan, Muñoz, Rafael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 103172
container_issue
container_start_page 103172
container_title Journal of biomedical informatics
container_volume 94
creator Piad-Morffis, Alejandro
Gutiérrez, Yoan
Muñoz, Rafael
description [Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs. This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.
doi_str_mv 10.1016/j.jbi.2019.103172
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2207164722</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046419300905</els_id><sourcerecordid>2207164722</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EoqXwAWxQlmxS_EjsRrCpyqOISmxgbTnOpHWUxsFOivr3uAp0yWpmNPdezRyErgmeEkz4XTWtcjOlmGRhZkTQEzQmKaMxTmb49NjzZIQuvK8wJiRN-TkaMZzxlDA-Rg_zSFvX9j7qbOT7trWui2AJqu420Vtjv2so1hA9Gq_tDtw-6kBvGlvbtQF_ic5KVXu4-q0T9Pn89LFYxqv3l9fFfBVrlvEuTlReKgVpkQvBeIYLniYi1TzPcyVKmjCScExnNNMsoVyRYkYECFAs7EuMgU3Q7ZDbOvvVg-_kNtwDda0asL2XlGJBeCIoDVIySLWz3jsoZevMVrm9JFgeoMlKBmjyAE0O0ILn5je-z7dQHB1_lILgfhBAeHJnwEmvDTQaCuNAd7Kw5p_4H0KKfAA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2207164722</pqid></control><display><type>article</type><title>A corpus to support eHealth Knowledge Discovery technologies</title><source>Elsevier ScienceDirect Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Piad-Morffis, Alejandro ; Gutiérrez, Yoan ; Muñoz, Rafael</creator><creatorcontrib>Piad-Morffis, Alejandro ; Gutiérrez, Yoan ; Muñoz, Rafael</creatorcontrib><description>[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs. This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2019.103172</identifier><identifier>PMID: 30965136</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Corpus ; eHealth ; Knowledge discovery ; Spanish ; Subject-Verb-Object</subject><ispartof>Journal of biomedical informatics, 2019-06, Vol.94, p.103172-103172, Article 103172</ispartof><rights>2019 Elsevier Inc.</rights><rights>Copyright © 2019 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</citedby><cites>FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</cites><orcidid>0000-0002-4052-7427</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1532046419300905$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30965136$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Piad-Morffis, Alejandro</creatorcontrib><creatorcontrib>Gutiérrez, Yoan</creatorcontrib><creatorcontrib>Muñoz, Rafael</creatorcontrib><title>A corpus to support eHealth Knowledge Discovery technologies</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs. This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.</description><subject>Corpus</subject><subject>eHealth</subject><subject>Knowledge discovery</subject><subject>Spanish</subject><subject>Subject-Verb-Object</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EoqXwAWxQlmxS_EjsRrCpyqOISmxgbTnOpHWUxsFOivr3uAp0yWpmNPdezRyErgmeEkz4XTWtcjOlmGRhZkTQEzQmKaMxTmb49NjzZIQuvK8wJiRN-TkaMZzxlDA-Rg_zSFvX9j7qbOT7trWui2AJqu420Vtjv2so1hA9Gq_tDtw-6kBvGlvbtQF_ic5KVXu4-q0T9Pn89LFYxqv3l9fFfBVrlvEuTlReKgVpkQvBeIYLniYi1TzPcyVKmjCScExnNNMsoVyRYkYECFAs7EuMgU3Q7ZDbOvvVg-_kNtwDda0asL2XlGJBeCIoDVIySLWz3jsoZevMVrm9JFgeoMlKBmjyAE0O0ILn5je-z7dQHB1_lILgfhBAeHJnwEmvDTQaCuNAd7Kw5p_4H0KKfAA</recordid><startdate>201906</startdate><enddate>201906</enddate><creator>Piad-Morffis, Alejandro</creator><creator>Gutiérrez, Yoan</creator><creator>Muñoz, Rafael</creator><general>Elsevier Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4052-7427</orcidid></search><sort><creationdate>201906</creationdate><title>A corpus to support eHealth Knowledge Discovery technologies</title><author>Piad-Morffis, Alejandro ; Gutiérrez, Yoan ; Muñoz, Rafael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Corpus</topic><topic>eHealth</topic><topic>Knowledge discovery</topic><topic>Spanish</topic><topic>Subject-Verb-Object</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Piad-Morffis, Alejandro</creatorcontrib><creatorcontrib>Gutiérrez, Yoan</creatorcontrib><creatorcontrib>Muñoz, Rafael</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Piad-Morffis, Alejandro</au><au>Gutiérrez, Yoan</au><au>Muñoz, Rafael</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A corpus to support eHealth Knowledge Discovery technologies</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2019-06</date><risdate>2019</risdate><volume>94</volume><spage>103172</spage><epage>103172</epage><pages>103172-103172</pages><artnum>103172</artnum><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs. This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>30965136</pmid><doi>10.1016/j.jbi.2019.103172</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-4052-7427</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1532-0464
ispartof Journal of biomedical informatics, 2019-06, Vol.94, p.103172-103172, Article 103172
issn 1532-0464
1532-0480
language eng
recordid cdi_proquest_miscellaneous_2207164722
source Elsevier ScienceDirect Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Corpus
eHealth
Knowledge discovery
Spanish
Subject-Verb-Object
title A corpus to support eHealth Knowledge Discovery technologies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T18%3A57%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20corpus%20to%20support%20eHealth%20Knowledge%20Discovery%20technologies&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Piad-Morffis,%20Alejandro&rft.date=2019-06&rft.volume=94&rft.spage=103172&rft.epage=103172&rft.pages=103172-103172&rft.artnum=103172&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2019.103172&rft_dat=%3Cproquest_cross%3E2207164722%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2207164722&rft_id=info:pmid/30965136&rft_els_id=S1532046419300905&rfr_iscdi=true