On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions

[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2015-08, Vol.56, p.318-332
Hauptverfasser:	Oronoz, Maite, Gojenola, Koldo, Pérez, Alicia, de Ilarraza, Arantza Díaz, Casillas, Arantza
Format:	Artikel
Sprache:	eng
Schlagworte:	Adverse drug reaction Adverse Drug Reaction Reporting Systems Algorithms Annotations Automation Clinical text Data Mining - methods Diseases Drug-Related Side Effects and Adverse Reactions Drugs Electronic Health Records - standards Ethics Gold standard Language Linguistics Machine Learning Medical Natural Language Processing Pharmaceutical Preparations Pharmacovigilance Predictive Value of Tests Reproducibility of Results Text mining Translating
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	332
container_issue
container_start_page	318
container_title	Journal of biomedical informatics
container_volume	56
creator	Oronoz, Maite Gojenola, Koldo Pérez, Alicia de Ilarraza, Arantza Díaz Casillas, Arantza
description	[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.
doi_str_mv	10.1016/j.jbi.2015.06.016
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1718962957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046415001264</els_id><sourcerecordid>1709177750</sourcerecordid><originalsourceid>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</originalsourceid><addsrcrecordid>eNqNkU1v1DAQhi0EoqXwA7ggH7lsmIm_Ejihii-pqAfghmQ59mTrVdZZ7KRS_z1ebekR9TSj0fM-h3kZe43QIKB-t2t2Q2xaQNWAburlCTtHJdoNyA6ePuxanrEXpewAEJXSz9lZq1Gi6eU5-32d-HJD3GdyS5wTn0fuuJ9iit5NfDtPgZfFpeBy4H7Oh7XwmPiPg0ux3Lzn3yuYttyFW8qFeMjrlleVP7rKS_ZsdFOhV_fzgv36_Onn5dfN1fWXb5cfrzZedmrZSOO1a9UYVAt-AJAjghmCwW4kOYiuE0LJHgcS0kuHrahwrweqpNNoOnHB3p68hzz_Waksdh-Lp2lyiea1WKyqXre9Mo9AoUdjjILHoKKVAoyqKJ5Qn-dSMo32kOPe5TuLYI9V2Z2tVdljVRa0rZeaeXOvX4c9hYfEv24q8OEEUH3dbaRsi4-UPIWYyS82zPE_-r-D46HD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1703243075</pqid></control><display><type>article</type><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</creator><creatorcontrib>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</creatorcontrib><description>[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2015.06.016</identifier><identifier>PMID: 26141794</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Adverse drug reaction ; Adverse Drug Reaction Reporting Systems ; Algorithms ; Annotations ; Automation ; Clinical text ; Data Mining - methods ; Diseases ; Drug-Related Side Effects and Adverse Reactions ; Drugs ; Electronic Health Records - standards ; Ethics ; Gold standard ; Language ; Linguistics ; Machine Learning ; Medical ; Natural Language Processing ; Pharmaceutical Preparations ; Pharmacovigilance ; Predictive Value of Tests ; Reproducibility of Results ; Text mining ; Translating</subject><ispartof>Journal of biomedical informatics, 2015-08, Vol.56, p.318-332</ispartof><rights>2015 Elsevier Inc.</rights><rights>Copyright © 2015 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</citedby><cites>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1532046415001264$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26141794$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Oronoz, Maite</creatorcontrib><creatorcontrib>Gojenola, Koldo</creatorcontrib><creatorcontrib>Pérez, Alicia</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Díaz</creatorcontrib><creatorcontrib>Casillas, Arantza</creatorcontrib><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</description><subject>Adverse drug reaction</subject><subject>Adverse Drug Reaction Reporting Systems</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Automation</subject><subject>Clinical text</subject><subject>Data Mining - methods</subject><subject>Diseases</subject><subject>Drug-Related Side Effects and Adverse Reactions</subject><subject>Drugs</subject><subject>Electronic Health Records - standards</subject><subject>Ethics</subject><subject>Gold standard</subject><subject>Language</subject><subject>Linguistics</subject><subject>Machine Learning</subject><subject>Medical</subject><subject>Natural Language Processing</subject><subject>Pharmaceutical Preparations</subject><subject>Pharmacovigilance</subject><subject>Predictive Value of Tests</subject><subject>Reproducibility of Results</subject><subject>Text mining</subject><subject>Translating</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkU1v1DAQhi0EoqXwA7ggH7lsmIm_Ejihii-pqAfghmQ59mTrVdZZ7KRS_z1ebekR9TSj0fM-h3kZe43QIKB-t2t2Q2xaQNWAburlCTtHJdoNyA6ePuxanrEXpewAEJXSz9lZq1Gi6eU5-32d-HJD3GdyS5wTn0fuuJ9iit5NfDtPgZfFpeBy4H7Oh7XwmPiPg0ux3Lzn3yuYttyFW8qFeMjrlleVP7rKS_ZsdFOhV_fzgv36_Onn5dfN1fWXb5cfrzZedmrZSOO1a9UYVAt-AJAjghmCwW4kOYiuE0LJHgcS0kuHrahwrweqpNNoOnHB3p68hzz_Waksdh-Lp2lyiea1WKyqXre9Mo9AoUdjjILHoKKVAoyqKJ5Qn-dSMo32kOPe5TuLYI9V2Z2tVdljVRa0rZeaeXOvX4c9hYfEv24q8OEEUH3dbaRsi4-UPIWYyS82zPE_-r-D46HD</recordid><startdate>201508</startdate><enddate>201508</enddate><creator>Oronoz, Maite</creator><creator>Gojenola, Koldo</creator><creator>Pérez, Alicia</creator><creator>de Ilarraza, Arantza Díaz</creator><creator>Casillas, Arantza</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7SC</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201508</creationdate><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><author>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Adverse drug reaction</topic><topic>Adverse Drug Reaction Reporting Systems</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Automation</topic><topic>Clinical text</topic><topic>Data Mining - methods</topic><topic>Diseases</topic><topic>Drug-Related Side Effects and Adverse Reactions</topic><topic>Drugs</topic><topic>Electronic Health Records - standards</topic><topic>Ethics</topic><topic>Gold standard</topic><topic>Language</topic><topic>Linguistics</topic><topic>Machine Learning</topic><topic>Medical</topic><topic>Natural Language Processing</topic><topic>Pharmaceutical Preparations</topic><topic>Pharmacovigilance</topic><topic>Predictive Value of Tests</topic><topic>Reproducibility of Results</topic><topic>Text mining</topic><topic>Translating</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Oronoz, Maite</creatorcontrib><creatorcontrib>Gojenola, Koldo</creatorcontrib><creatorcontrib>Pérez, Alicia</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Díaz</creatorcontrib><creatorcontrib>Casillas, Arantza</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Oronoz, Maite</au><au>Gojenola, Koldo</au><au>Pérez, Alicia</au><au>de Ilarraza, Arantza Díaz</au><au>Casillas, Arantza</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2015-08</date><risdate>2015</risdate><volume>56</volume><spage>318</spage><epage>332</epage><pages>318-332</pages><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus. The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>26141794</pmid><doi>10.1016/j.jbi.2015.06.016</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1532-0464
ispartof	Journal of biomedical informatics, 2015-08, Vol.56, p.318-332
issn	1532-0464 1532-0480
language	eng
recordid	cdi_proquest_miscellaneous_1718962957
source	MEDLINE; Elsevier ScienceDirect Journals; EZB-FREE-00999 freely available EZB journals
subjects	Adverse drug reaction Adverse Drug Reaction Reporting Systems Algorithms Annotations Automation Clinical text Data Mining - methods Diseases Drug-Related Side Effects and Adverse Reactions Drugs Electronic Health Records - standards Ethics Gold standard Language Linguistics Machine Learning Medical Natural Language Processing Pharmaceutical Preparations Pharmacovigilance Predictive Value of Tests Reproducibility of Results Text mining Translating
title	On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T00%3A06%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20creation%20of%20a%20clinical%20gold%20standard%20corpus%20in%20Spanish:%20Mining%20adverse%20drug%20reactions&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Oronoz,%20Maite&rft.date=2015-08&rft.volume=56&rft.spage=318&rft.epage=332&rft.pages=318-332&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2015.06.016&rft_dat=%3Cproquest_cross%3E1709177750%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1703243075&rft_id=info:pmid/26141794&rft_els_id=S1532046415001264&rfr_iscdi=true