A corpus to support eHealth Knowledge Discovery technologies
[Display omitted] •A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to...
Gespeichert in:
Veröffentlicht in: | Journal of biomedical informatics 2019-06, Vol.94, p.103172-103172, Article 103172 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 103172 |
---|---|
container_issue | |
container_start_page | 103172 |
container_title | Journal of biomedical informatics |
container_volume | 94 |
creator | Piad-Morffis, Alejandro Gutiérrez, Yoan Muñoz, Rafael |
description | [Display omitted]
•A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs.
This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains. |
doi_str_mv | 10.1016/j.jbi.2019.103172 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2207164722</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046419300905</els_id><sourcerecordid>2207164722</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EoqXwAWxQlmxS_EjsRrCpyqOISmxgbTnOpHWUxsFOivr3uAp0yWpmNPdezRyErgmeEkz4XTWtcjOlmGRhZkTQEzQmKaMxTmb49NjzZIQuvK8wJiRN-TkaMZzxlDA-Rg_zSFvX9j7qbOT7trWui2AJqu420Vtjv2so1hA9Gq_tDtw-6kBvGlvbtQF_ic5KVXu4-q0T9Pn89LFYxqv3l9fFfBVrlvEuTlReKgVpkQvBeIYLniYi1TzPcyVKmjCScExnNNMsoVyRYkYECFAs7EuMgU3Q7ZDbOvvVg-_kNtwDda0asL2XlGJBeCIoDVIySLWz3jsoZevMVrm9JFgeoMlKBmjyAE0O0ILn5je-z7dQHB1_lILgfhBAeHJnwEmvDTQaCuNAd7Kw5p_4H0KKfAA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2207164722</pqid></control><display><type>article</type><title>A corpus to support eHealth Knowledge Discovery technologies</title><source>Elsevier ScienceDirect Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Piad-Morffis, Alejandro ; Gutiérrez, Yoan ; Muñoz, Rafael</creator><creatorcontrib>Piad-Morffis, Alejandro ; Gutiérrez, Yoan ; Muñoz, Rafael</creatorcontrib><description>[Display omitted]
•A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs.
This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2019.103172</identifier><identifier>PMID: 30965136</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Corpus ; eHealth ; Knowledge discovery ; Spanish ; Subject-Verb-Object</subject><ispartof>Journal of biomedical informatics, 2019-06, Vol.94, p.103172-103172, Article 103172</ispartof><rights>2019 Elsevier Inc.</rights><rights>Copyright © 2019 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</citedby><cites>FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</cites><orcidid>0000-0002-4052-7427</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1532046419300905$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30965136$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Piad-Morffis, Alejandro</creatorcontrib><creatorcontrib>Gutiérrez, Yoan</creatorcontrib><creatorcontrib>Muñoz, Rafael</creatorcontrib><title>A corpus to support eHealth Knowledge Discovery technologies</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>[Display omitted]
•A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs.
This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.</description><subject>Corpus</subject><subject>eHealth</subject><subject>Knowledge discovery</subject><subject>Spanish</subject><subject>Subject-Verb-Object</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EoqXwAWxQlmxS_EjsRrCpyqOISmxgbTnOpHWUxsFOivr3uAp0yWpmNPdezRyErgmeEkz4XTWtcjOlmGRhZkTQEzQmKaMxTmb49NjzZIQuvK8wJiRN-TkaMZzxlDA-Rg_zSFvX9j7qbOT7trWui2AJqu420Vtjv2so1hA9Gq_tDtw-6kBvGlvbtQF_ic5KVXu4-q0T9Pn89LFYxqv3l9fFfBVrlvEuTlReKgVpkQvBeIYLniYi1TzPcyVKmjCScExnNNMsoVyRYkYECFAs7EuMgU3Q7ZDbOvvVg-_kNtwDda0asL2XlGJBeCIoDVIySLWz3jsoZevMVrm9JFgeoMlKBmjyAE0O0ILn5je-z7dQHB1_lILgfhBAeHJnwEmvDTQaCuNAd7Kw5p_4H0KKfAA</recordid><startdate>201906</startdate><enddate>201906</enddate><creator>Piad-Morffis, Alejandro</creator><creator>Gutiérrez, Yoan</creator><creator>Muñoz, Rafael</creator><general>Elsevier Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4052-7427</orcidid></search><sort><creationdate>201906</creationdate><title>A corpus to support eHealth Knowledge Discovery technologies</title><author>Piad-Morffis, Alejandro ; Gutiérrez, Yoan ; Muñoz, Rafael</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-4abfaae5db773690d65475c6bbba7f24314602829c3426a1d817e7ea3bbaf00e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Corpus</topic><topic>eHealth</topic><topic>Knowledge discovery</topic><topic>Spanish</topic><topic>Subject-Verb-Object</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Piad-Morffis, Alejandro</creatorcontrib><creatorcontrib>Gutiérrez, Yoan</creatorcontrib><creatorcontrib>Muñoz, Rafael</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Piad-Morffis, Alejandro</au><au>Gutiérrez, Yoan</au><au>Muñoz, Rafael</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A corpus to support eHealth Knowledge Discovery technologies</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2019-06</date><risdate>2019</risdate><volume>94</volume><spage>103172</spage><epage>103172</epage><pages>103172-103172</pages><artnum>103172</artnum><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted]
•A general annotation schema based on the Subject-Verb-Object structure is provided.•A manually annotated corpus in the Spanish health domain is developed.•A set of basic techniques are provided, which can be used as a comparison baseline.•Source code and data are made available to encourage the development of HLTs.
This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>30965136</pmid><doi>10.1016/j.jbi.2019.103172</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-4052-7427</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-0464 |
ispartof | Journal of biomedical informatics, 2019-06, Vol.94, p.103172-103172, Article 103172 |
issn | 1532-0464 1532-0480 |
language | eng |
recordid | cdi_proquest_miscellaneous_2207164722 |
source | Elsevier ScienceDirect Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Corpus eHealth Knowledge discovery Spanish Subject-Verb-Object |
title | A corpus to support eHealth Knowledge Discovery technologies |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T18%3A57%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20corpus%20to%20support%20eHealth%20Knowledge%20Discovery%20technologies&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Piad-Morffis,%20Alejandro&rft.date=2019-06&rft.volume=94&rft.spage=103172&rft.epage=103172&rft.pages=103172-103172&rft.artnum=103172&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2019.103172&rft_dat=%3Cproquest_cross%3E2207164722%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2207164722&rft_id=info:pmid/30965136&rft_els_id=S1532046419300905&rfr_iscdi=true |