On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions
[Display omitted] •Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical...
Gespeichert in:
Veröffentlicht in: | Journal of biomedical informatics 2015-08, Vol.56, p.318-332 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 332 |
---|---|
container_issue | |
container_start_page | 318 |
container_title | Journal of biomedical informatics |
container_volume | 56 |
creator | Oronoz, Maite Gojenola, Koldo Pérez, Alicia de Ilarraza, Arantza Díaz Casillas, Arantza |
description | [Display omitted]
•Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus.
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning. |
doi_str_mv | 10.1016/j.jbi.2015.06.016 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1718962957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046415001264</els_id><sourcerecordid>1709177750</sourcerecordid><originalsourceid>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</originalsourceid><addsrcrecordid>eNqNkU1v1DAQhi0EoqXwA7ggH7lsmIm_Ejihii-pqAfghmQ59mTrVdZZ7KRS_z1ebekR9TSj0fM-h3kZe43QIKB-t2t2Q2xaQNWAburlCTtHJdoNyA6ePuxanrEXpewAEJXSz9lZq1Gi6eU5-32d-HJD3GdyS5wTn0fuuJ9iit5NfDtPgZfFpeBy4H7Oh7XwmPiPg0ux3Lzn3yuYttyFW8qFeMjrlleVP7rKS_ZsdFOhV_fzgv36_Onn5dfN1fWXb5cfrzZedmrZSOO1a9UYVAt-AJAjghmCwW4kOYiuE0LJHgcS0kuHrahwrweqpNNoOnHB3p68hzz_Waksdh-Lp2lyiea1WKyqXre9Mo9AoUdjjILHoKKVAoyqKJ5Qn-dSMo32kOPe5TuLYI9V2Z2tVdljVRa0rZeaeXOvX4c9hYfEv24q8OEEUH3dbaRsi4-UPIWYyS82zPE_-r-D46HD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1703243075</pqid></control><display><type>article</type><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</creator><creatorcontrib>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</creatorcontrib><description>[Display omitted]
•Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus.
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2015.06.016</identifier><identifier>PMID: 26141794</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Adverse drug reaction ; Adverse Drug Reaction Reporting Systems ; Algorithms ; Annotations ; Automation ; Clinical text ; Data Mining - methods ; Diseases ; Drug-Related Side Effects and Adverse Reactions ; Drugs ; Electronic Health Records - standards ; Ethics ; Gold standard ; Language ; Linguistics ; Machine Learning ; Medical ; Natural Language Processing ; Pharmaceutical Preparations ; Pharmacovigilance ; Predictive Value of Tests ; Reproducibility of Results ; Text mining ; Translating</subject><ispartof>Journal of biomedical informatics, 2015-08, Vol.56, p.318-332</ispartof><rights>2015 Elsevier Inc.</rights><rights>Copyright © 2015 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</citedby><cites>FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1532046415001264$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26141794$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Oronoz, Maite</creatorcontrib><creatorcontrib>Gojenola, Koldo</creatorcontrib><creatorcontrib>Pérez, Alicia</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Díaz</creatorcontrib><creatorcontrib>Casillas, Arantza</creatorcontrib><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>[Display omitted]
•Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus.
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</description><subject>Adverse drug reaction</subject><subject>Adverse Drug Reaction Reporting Systems</subject><subject>Algorithms</subject><subject>Annotations</subject><subject>Automation</subject><subject>Clinical text</subject><subject>Data Mining - methods</subject><subject>Diseases</subject><subject>Drug-Related Side Effects and Adverse Reactions</subject><subject>Drugs</subject><subject>Electronic Health Records - standards</subject><subject>Ethics</subject><subject>Gold standard</subject><subject>Language</subject><subject>Linguistics</subject><subject>Machine Learning</subject><subject>Medical</subject><subject>Natural Language Processing</subject><subject>Pharmaceutical Preparations</subject><subject>Pharmacovigilance</subject><subject>Predictive Value of Tests</subject><subject>Reproducibility of Results</subject><subject>Text mining</subject><subject>Translating</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkU1v1DAQhi0EoqXwA7ggH7lsmIm_Ejihii-pqAfghmQ59mTrVdZZ7KRS_z1ebekR9TSj0fM-h3kZe43QIKB-t2t2Q2xaQNWAburlCTtHJdoNyA6ePuxanrEXpewAEJXSz9lZq1Gi6eU5-32d-HJD3GdyS5wTn0fuuJ9iit5NfDtPgZfFpeBy4H7Oh7XwmPiPg0ux3Lzn3yuYttyFW8qFeMjrlleVP7rKS_ZsdFOhV_fzgv36_Onn5dfN1fWXb5cfrzZedmrZSOO1a9UYVAt-AJAjghmCwW4kOYiuE0LJHgcS0kuHrahwrweqpNNoOnHB3p68hzz_Waksdh-Lp2lyiea1WKyqXre9Mo9AoUdjjILHoKKVAoyqKJ5Qn-dSMo32kOPe5TuLYI9V2Z2tVdljVRa0rZeaeXOvX4c9hYfEv24q8OEEUH3dbaRsi4-UPIWYyS82zPE_-r-D46HD</recordid><startdate>201508</startdate><enddate>201508</enddate><creator>Oronoz, Maite</creator><creator>Gojenola, Koldo</creator><creator>Pérez, Alicia</creator><creator>de Ilarraza, Arantza Díaz</creator><creator>Casillas, Arantza</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7SC</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201508</creationdate><title>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</title><author>Oronoz, Maite ; Gojenola, Koldo ; Pérez, Alicia ; de Ilarraza, Arantza Díaz ; Casillas, Arantza</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c485t-47c6a25fd520cb004f107bd718fe4b388335491be34c4a123a2596be004a61783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Adverse drug reaction</topic><topic>Adverse Drug Reaction Reporting Systems</topic><topic>Algorithms</topic><topic>Annotations</topic><topic>Automation</topic><topic>Clinical text</topic><topic>Data Mining - methods</topic><topic>Diseases</topic><topic>Drug-Related Side Effects and Adverse Reactions</topic><topic>Drugs</topic><topic>Electronic Health Records - standards</topic><topic>Ethics</topic><topic>Gold standard</topic><topic>Language</topic><topic>Linguistics</topic><topic>Machine Learning</topic><topic>Medical</topic><topic>Natural Language Processing</topic><topic>Pharmaceutical Preparations</topic><topic>Pharmacovigilance</topic><topic>Predictive Value of Tests</topic><topic>Reproducibility of Results</topic><topic>Text mining</topic><topic>Translating</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Oronoz, Maite</creatorcontrib><creatorcontrib>Gojenola, Koldo</creatorcontrib><creatorcontrib>Pérez, Alicia</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Díaz</creatorcontrib><creatorcontrib>Casillas, Arantza</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Oronoz, Maite</au><au>Gojenola, Koldo</au><au>Pérez, Alicia</au><au>de Ilarraza, Arantza Díaz</au><au>Casillas, Arantza</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2015-08</date><risdate>2015</risdate><volume>56</volume><spage>318</spage><epage>332</epage><pages>318-332</pages><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted]
•Creation of a gold standard of electronic health records in Spanish.•Annotation of diseases, drugs and adverse drug reaction (ADR) events.•Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.•Development and assessment of linguistic analyzer for medical texts.•Automatic ADR extraction with machine learning shows the potential of the corpus.
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>26141794</pmid><doi>10.1016/j.jbi.2015.06.016</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-0464 |
ispartof | Journal of biomedical informatics, 2015-08, Vol.56, p.318-332 |
issn | 1532-0464 1532-0480 |
language | eng |
recordid | cdi_proquest_miscellaneous_1718962957 |
source | MEDLINE; Elsevier ScienceDirect Journals; EZB-FREE-00999 freely available EZB journals |
subjects | Adverse drug reaction Adverse Drug Reaction Reporting Systems Algorithms Annotations Automation Clinical text Data Mining - methods Diseases Drug-Related Side Effects and Adverse Reactions Drugs Electronic Health Records - standards Ethics Gold standard Language Linguistics Machine Learning Medical Natural Language Processing Pharmaceutical Preparations Pharmacovigilance Predictive Value of Tests Reproducibility of Results Text mining Translating |
title | On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T00%3A06%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20creation%20of%20a%20clinical%20gold%20standard%20corpus%20in%20Spanish:%20Mining%20adverse%20drug%20reactions&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Oronoz,%20Maite&rft.date=2015-08&rft.volume=56&rft.spage=318&rft.epage=332&rft.pages=318-332&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2015.06.016&rft_dat=%3Cproquest_cross%3E1709177750%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1703243075&rft_id=info:pmid/26141794&rft_els_id=S1532046415001264&rfr_iscdi=true |