Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping

•Our manual review of 487,300 clinical notes for 10 diseases clarified the presence of disease mentions that do not connote the patient’s diagnosis contrary to syntactic characteristics for all object diseases, except diabetic nephropathy.•If extracting disease mentions from clinical notes is adopte...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of medical informatics (Shannon, Ireland) Ireland), 2019-04, Vol.124, p.90-96
Hauptverfasser: Kagawa, Rina, Shinohara, Emiko, Imai, Takeshi, Kawazoe, Yoshimasa, Ohe, Kazuhiko
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 96
container_issue
container_start_page 90
container_title International journal of medical informatics (Shannon, Ireland)
container_volume 124
creator Kagawa, Rina
Shinohara, Emiko
Imai, Takeshi
Kawazoe, Yoshimasa
Ohe, Kazuhiko
description •Our manual review of 487,300 clinical notes for 10 diseases clarified the presence of disease mentions that do not connote the patient’s diagnosis contrary to syntactic characteristics for all object diseases, except diabetic nephropathy.•If extracting disease mentions from clinical notes is adopted as simple and robust electronic health record-based phenotyping algorithms, the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision is 78.1% (on average) for free text in progress notes.•The following five categories of physicians’ intentions to write such disease mentions were also formulated: (1) Differential diagnosis, (2) Misinterpretation of meanings, (3) Possibility of suffering from the disease in the future, (4) Screening, pre-surgery screening, general meanings, and (5) Family history, diagnosis of another person. Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient’s diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. Physicians manually reviewed whether the disease mentions indicated the patients’ diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient’s diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients’ diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient’s diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research envi
doi_str_mv 10.1016/j.ijmedinf.2018.12.004
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2184144002</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1386505618310165</els_id><sourcerecordid>2184144002</sourcerecordid><originalsourceid>FETCH-LOGICAL-c434t-fe3eb64e5e4b645a09e6e6f85bb48e96387c6ca501a5cfd5c54576a8c0a4130a3</originalsourceid><addsrcrecordid>eNqFkE1P3DAQhi1EBcvHX0A-9pIwjj9ibm23wK5ExYfgbDnOBLzK2oudrcS_x2ih155mpHneGc1DyBmDmgFT56var9bY-zDUDTBds6YGEHtkxnTbVLoRfL_0XKtKglSH5CjnFQBrQYoDcsih1ULwZkbuf3mbaRzoMljntslOSH_7jDYj_YNh8jFk6gO9HNFNKQbv6ALtOL3QB3Qx9VVXyJ7evWCI09vGh-cT8m2wY8bTz3pMnq4uH-eL6ub2ejn_eVM5wcVUDcixUwIlilKkhQtUqAYtu05ovFBct045K4FZ6YZeOilkq6x2YAXjYPkx-b7bu0nxdYt5MmufHY6jDRi32TRMCyYEQFNQtUNdijknHMwm-bVNb4aB-dBpVuZLp_nQaVhjis4SPPu8se3K-F_sy18BfuwALJ_-9ZhMdh6DK6tSEWb66P934x0Ro4m2</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2184144002</pqid></control><display><type>article</type><title>Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Kagawa, Rina ; Shinohara, Emiko ; Imai, Takeshi ; Kawazoe, Yoshimasa ; Ohe, Kazuhiko</creator><creatorcontrib>Kagawa, Rina ; Shinohara, Emiko ; Imai, Takeshi ; Kawazoe, Yoshimasa ; Ohe, Kazuhiko</creatorcontrib><description>•Our manual review of 487,300 clinical notes for 10 diseases clarified the presence of disease mentions that do not connote the patient’s diagnosis contrary to syntactic characteristics for all object diseases, except diabetic nephropathy.•If extracting disease mentions from clinical notes is adopted as simple and robust electronic health record-based phenotyping algorithms, the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision is 78.1% (on average) for free text in progress notes.•The following five categories of physicians’ intentions to write such disease mentions were also formulated: (1) Differential diagnosis, (2) Misinterpretation of meanings, (3) Possibility of suffering from the disease in the future, (4) Screening, pre-surgery screening, general meanings, and (5) Family history, diagnosis of another person. Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient’s diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. Physicians manually reviewed whether the disease mentions indicated the patients’ diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient’s diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients’ diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient’s diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research environments with different available data types.</description><identifier>ISSN: 1386-5056</identifier><identifier>EISSN: 1872-8243</identifier><identifier>DOI: 10.1016/j.ijmedinf.2018.12.004</identifier><identifier>PMID: 30784432</identifier><language>eng</language><publisher>Ireland: Elsevier B.V</publisher><ispartof>International journal of medical informatics (Shannon, Ireland), 2019-04, Vol.124, p.90-96</ispartof><rights>2018 Elsevier B.V.</rights><rights>Copyright © 2018 Elsevier B.V. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c434t-fe3eb64e5e4b645a09e6e6f85bb48e96387c6ca501a5cfd5c54576a8c0a4130a3</citedby><cites>FETCH-LOGICAL-c434t-fe3eb64e5e4b645a09e6e6f85bb48e96387c6ca501a5cfd5c54576a8c0a4130a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1386505618310165$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30784432$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kagawa, Rina</creatorcontrib><creatorcontrib>Shinohara, Emiko</creatorcontrib><creatorcontrib>Imai, Takeshi</creatorcontrib><creatorcontrib>Kawazoe, Yoshimasa</creatorcontrib><creatorcontrib>Ohe, Kazuhiko</creatorcontrib><title>Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping</title><title>International journal of medical informatics (Shannon, Ireland)</title><addtitle>Int J Med Inform</addtitle><description>•Our manual review of 487,300 clinical notes for 10 diseases clarified the presence of disease mentions that do not connote the patient’s diagnosis contrary to syntactic characteristics for all object diseases, except diabetic nephropathy.•If extracting disease mentions from clinical notes is adopted as simple and robust electronic health record-based phenotyping algorithms, the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision is 78.1% (on average) for free text in progress notes.•The following five categories of physicians’ intentions to write such disease mentions were also formulated: (1) Differential diagnosis, (2) Misinterpretation of meanings, (3) Possibility of suffering from the disease in the future, (4) Screening, pre-surgery screening, general meanings, and (5) Family history, diagnosis of another person. Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient’s diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. Physicians manually reviewed whether the disease mentions indicated the patients’ diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient’s diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients’ diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient’s diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research environments with different available data types.</description><issn>1386-5056</issn><issn>1872-8243</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNqFkE1P3DAQhi1EBcvHX0A-9pIwjj9ibm23wK5ExYfgbDnOBLzK2oudrcS_x2ih155mpHneGc1DyBmDmgFT56var9bY-zDUDTBds6YGEHtkxnTbVLoRfL_0XKtKglSH5CjnFQBrQYoDcsih1ULwZkbuf3mbaRzoMljntslOSH_7jDYj_YNh8jFk6gO9HNFNKQbv6ALtOL3QB3Qx9VVXyJ7evWCI09vGh-cT8m2wY8bTz3pMnq4uH-eL6ub2ejn_eVM5wcVUDcixUwIlilKkhQtUqAYtu05ovFBct045K4FZ6YZeOilkq6x2YAXjYPkx-b7bu0nxdYt5MmufHY6jDRi32TRMCyYEQFNQtUNdijknHMwm-bVNb4aB-dBpVuZLp_nQaVhjis4SPPu8se3K-F_sy18BfuwALJ_-9ZhMdh6DK6tSEWb66P934x0Ro4m2</recordid><startdate>201904</startdate><enddate>201904</enddate><creator>Kagawa, Rina</creator><creator>Shinohara, Emiko</creator><creator>Imai, Takeshi</creator><creator>Kawazoe, Yoshimasa</creator><creator>Ohe, Kazuhiko</creator><general>Elsevier B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>201904</creationdate><title>Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping</title><author>Kagawa, Rina ; Shinohara, Emiko ; Imai, Takeshi ; Kawazoe, Yoshimasa ; Ohe, Kazuhiko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c434t-fe3eb64e5e4b645a09e6e6f85bb48e96387c6ca501a5cfd5c54576a8c0a4130a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kagawa, Rina</creatorcontrib><creatorcontrib>Shinohara, Emiko</creatorcontrib><creatorcontrib>Imai, Takeshi</creatorcontrib><creatorcontrib>Kawazoe, Yoshimasa</creatorcontrib><creatorcontrib>Ohe, Kazuhiko</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>International journal of medical informatics (Shannon, Ireland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kagawa, Rina</au><au>Shinohara, Emiko</au><au>Imai, Takeshi</au><au>Kawazoe, Yoshimasa</au><au>Ohe, Kazuhiko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping</atitle><jtitle>International journal of medical informatics (Shannon, Ireland)</jtitle><addtitle>Int J Med Inform</addtitle><date>2019-04</date><risdate>2019</risdate><volume>124</volume><spage>90</spage><epage>96</epage><pages>90-96</pages><issn>1386-5056</issn><eissn>1872-8243</eissn><abstract>•Our manual review of 487,300 clinical notes for 10 diseases clarified the presence of disease mentions that do not connote the patient’s diagnosis contrary to syntactic characteristics for all object diseases, except diabetic nephropathy.•If extracting disease mentions from clinical notes is adopted as simple and robust electronic health record-based phenotyping algorithms, the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision is 78.1% (on average) for free text in progress notes.•The following five categories of physicians’ intentions to write such disease mentions were also formulated: (1) Differential diagnosis, (2) Misinterpretation of meanings, (3) Possibility of suffering from the disease in the future, (4) Screening, pre-surgery screening, general meanings, and (5) Family history, diagnosis of another person. Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient’s diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. Physicians manually reviewed whether the disease mentions indicated the patients’ diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient’s diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients’ diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient’s diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research environments with different available data types.</abstract><cop>Ireland</cop><pub>Elsevier B.V</pub><pmid>30784432</pmid><doi>10.1016/j.ijmedinf.2018.12.004</doi><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1386-5056
ispartof International journal of medical informatics (Shannon, Ireland), 2019-04, Vol.124, p.90-96
issn 1386-5056
1872-8243
language eng
recordid cdi_proquest_miscellaneous_2184144002
source ScienceDirect Journals (5 years ago - present)
title Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T18%3A08%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bias%20of%20Inaccurate%20Disease%20Mentions%20in%20Electronic%20Health%20Record-based%20Phenotyping&rft.jtitle=International%20journal%20of%20medical%20informatics%20(Shannon,%20Ireland)&rft.au=Kagawa,%20Rina&rft.date=2019-04&rft.volume=124&rft.spage=90&rft.epage=96&rft.pages=90-96&rft.issn=1386-5056&rft.eissn=1872-8243&rft_id=info:doi/10.1016/j.ijmedinf.2018.12.004&rft_dat=%3Cproquest_cross%3E2184144002%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2184144002&rft_id=info:pmid/30784432&rft_els_id=S1386505618310165&rfr_iscdi=true