On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions

[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2015-08, Vol.56, p.318-332
Hauptverfasser: Oronoz, Maite, Gojenola, Koldo, Pérez, Alicia, de Ilarraza, Arantza Díaz, Casillas, Arantza
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 332
container_issue
container_start_page 318
container_title Journal of biomedical informatics
container_volume 56
creator Oronoz, Maite
Gojenola, Koldo
Pérez, Alicia
de Ilarraza, Arantza Díaz
Casillas, Arantza
description [Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.
doi_str_mv 10.1016/j.jbi.2015.06.016
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1718962957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046415001264</els_id><sourcerecordid>1709177750</sourcerecordid><originalsourceid>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</originalsourceid><addsrcrecordid>eNqNkU1v1DAQhi0EoqXwA7ggH7lsmIm_Ejihii-pqAfghmQ59mTrVdZZ7KRS_z1ebekR9TSj0fM-h3kZe43QIKB-t2t2Q2xaQNWAburlCTtHJdoNyA6ePuxanrEXpewAEJXSz9lZq1Gi6eU5-32d-HJD3GdyS5wTn0fuuJ9iit5NfDtPgZfFpeBy4H7Oh7XwmPiPg0ux3Lzn3yuYttyFW8qFeMjrlleVP7rKS_ZsdFOhV_fzgv36_Onn5dfN1fWXb5cfrzZedmrZSOO1a9UYVAt-AJAjghmCwW4kOYiuE0LJHgcS0kuHrahwrweqpNNoOnHB3p68hzz_Waksdh-Lp2lyiea1WKyqXre9Mo9AoUdjjILHoKKVAoyqKJ5Qn-dSMo32kOPe5TuLYI9V2Z2tVdljVRa0rZeaeXOvX4c9hYfEv24q8OEEUH3dbaRsi4-UPIWYyS82zPE_-r-D46HD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1703243075</pqid></control><display><type>article</type><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</creator><creatorcontrib>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</creatorcontrib><description>[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2015.06.016</identifier><identifier>PMID: 26141794</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Adverse drug reaction ; Adverse Drug Reaction Reporting Systems ; Algorithms ; Annotations ; Automation ; Clinical text ; Data Mining - methods ; Diseases ; Drug-Related Side Effects and Adverse Reactions ; Drugs ; Electronic Health Records - standards ; Ethics ; Gold standard ; Language ; Linguistics ; Machine Learning ; Medical ; Natural Language Processing ; Pharmaceutical Preparations ; Pharmacovigilance ; Predictive Value of Tests ; Reproducibility of Results ; Text mining ; Translating</subject><ispartof>Journal of biomedical informatics, 2015-08, Vol.56, p.318-332</ispartof><rights>2015 Elsevier Inc.</rights><rights>Copyright © 2015 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</citedby><cites>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1532046415001264$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26141794$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Oronoz, Maite</creatorcontrib><creatorcontrib>Gojenola, Koldo</creatorcontrib><creatorcontrib>Pérez, Alicia</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Díaz</creatorcontrib><creatorcontrib>Casillas, Arantza</creatorcontrib><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</description><subject>Adverse drug reaction</subject><subject>Adverse Drug Reaction Reporting Systems</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Automation</subject><subject>Clinical text</subject><subject>Data Mining - methods</subject><subject>Diseases</subject><subject>Drug-Related Side Effects and Adverse Reactions</subject><subject>Drugs</subject><subject>Electronic Health Records - standards</subject><subject>Ethics</subject><subject>Gold standard</subject><subject>Language</subject><subject>Linguistics</subject><subject>Machine Learning</subject><subject>Medical</subject><subject>Natural Language Processing</subject><subject>Pharmaceutical Preparations</subject><subject>Pharmacovigilance</subject><subject>Predictive Value of Tests</subject><subject>Reproducibility of Results</subject><subject>Text mining</subject><subject>Translating</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkU1v1DAQhi0EoqXwA7ggH7lsmIm_Ejihii-pqAfghmQ59mTrVdZZ7KRS_z1ebekR9TSj0fM-h3kZe43QIKB-t2t2Q2xaQNWAburlCTtHJdoNyA6ePuxanrEXpewAEJXSz9lZq1Gi6eU5-32d-HJD3GdyS5wTn0fuuJ9iit5NfDtPgZfFpeBy4H7Oh7XwmPiPg0ux3Lzn3yuYttyFW8qFeMjrlleVP7rKS_ZsdFOhV_fzgv36_Onn5dfN1fWXb5cfrzZedmrZSOO1a9UYVAt-AJAjghmCwW4kOYiuE0LJHgcS0kuHrahwrweqpNNoOnHB3p68hzz_Waksdh-Lp2lyiea1WKyqXre9Mo9AoUdjjILHoKKVAoyqKJ5Qn-dSMo32kOPe5TuLYI9V2Z2tVdljVRa0rZeaeXOvX4c9hYfEv24q8OEEUH3dbaRsi4-UPIWYyS82zPE_-r-D46HD</recordid><startdate>201508</startdate><enddate>201508</enddate><creator>Oronoz, Maite</creator><creator>Gojenola, Koldo</creator><creator>Pérez, Alicia</creator><creator>de Ilarraza, Arantza Díaz</creator><creator>Casillas, Arantza</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7SC</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201508</creationdate><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><author>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Adverse drug reaction</topic><topic>Adverse Drug Reaction Reporting Systems</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Automation</topic><topic>Clinical text</topic><topic>Data Mining - methods</topic><topic>Diseases</topic><topic>Drug-Related Side Effects and Adverse Reactions</topic><topic>Drugs</topic><topic>Electronic Health Records - standards</topic><topic>Ethics</topic><topic>Gold standard</topic><topic>Language</topic><topic>Linguistics</topic><topic>Machine Learning</topic><topic>Medical</topic><topic>Natural Language Processing</topic><topic>Pharmaceutical Preparations</topic><topic>Pharmacovigilance</topic><topic>Predictive Value of Tests</topic><topic>Reproducibility of Results</topic><topic>Text mining</topic><topic>Translating</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Oronoz, Maite</creatorcontrib><creatorcontrib>Gojenola, Koldo</creatorcontrib><creatorcontrib>Pérez, Alicia</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Díaz</creatorcontrib><creatorcontrib>Casillas, Arantza</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Oronoz, Maite</au><au>Gojenola, Koldo</au><au>Pérez, Alicia</au><au>de Ilarraza, Arantza Díaz</au><au>Casillas, Arantza</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2015-08</date><risdate>2015</risdate><volume>56</volume><spage>318</spage><epage>332</epage><pages>318-332</pages><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>26141794</pmid><doi>10.1016/j.jbi.2015.06.016</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1532-0464
ispartof Journal of biomedical informatics, 2015-08, Vol.56, p.318-332
issn 1532-0464
1532-0480
language eng
recordid cdi_proquest_miscellaneous_1718962957
source MEDLINE; Elsevier ScienceDirect Journals; EZB-FREE-00999 freely available EZB journals
subjects Adverse drug reaction
Adverse Drug Reaction Reporting Systems
Algorithms
Annotations
Automation
Clinical text
Data Mining - methods
Diseases
Drug-Related Side Effects and Adverse Reactions
Drugs
Electronic Health Records - standards
Ethics
Gold standard
Language
Linguistics
Machine Learning
Medical
Natural Language Processing
Pharmaceutical Preparations
Pharmacovigilance
Predictive Value of Tests
Reproducibility of Results
Text mining
Translating
title On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T00%3A06%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20creation%20of%20a%20clinical%20gold%20standard%20corpus%20in%20Spanish:%20Mining%20adverse%20drug%20reactions&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Oronoz,%20Maite&rft.date=2015-08&rft.volume=56&rft.spage=318&rft.epage=332&rft.pages=318-332&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2015.06.016&rft_dat=%3Cproquest_cross%3E1709177750%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1703243075&rft_id=info:pmid/26141794&rft_els_id=S1532046415001264&rfr_iscdi=true