Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution

[Display omitted] •Early warning system performance worsens if missingness pattern changes in EHR data.•Generated synthetic EHR data with variational autoencoder and custom loss function.•Randomized and Bayesian regression imputation appropriate for tree-based methods.•Using proper imputation, we de...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2020-10, Vol.110, p.103528-103528, Article 103528
Hauptverfasser:	Gillies, Christopher E., Taylor, Daniel F., Cummings, Brandon C., Ansari, Sardar, Islim, Fadi, Kronick, Steven L., Medlin, Richard P., Ward, Kevin R.
Format:	Artikel
Sprache:	eng
Schlagworte:	Early warning systems Machine learning Patient deterioration Simulation Tree-based methods Variational autoencoder
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	103528
container_issue
container_start_page	103528
container_title	Journal of biomedical informatics
container_volume	110
creator	Gillies, Christopher E. Taylor, Daniel F. Cummings, Brandon C. Ansari, Sardar Islim, Fadi Kronick, Steven L. Medlin, Richard P. Ward, Kevin R.
description	[Display omitted] •Early warning system performance worsens if missingness pattern changes in EHR data.•Generated synthetic EHR data with variational autoencoder and custom loss function.•Randomized and Bayesian regression imputation appropriate for tree-based methods.•Using proper imputation, we developed PICTURE to predict patient deterioration.•PICTURE performance is comparable to current systems and it can explain predictions. When using tree-based methods to develop predictive analytics and early warning systems for preventive healthcare, it is important to use an appropriate imputation method to prevent learning the missingness pattern. To demonstrate this, we developed a novel simulation that generated synthetic electronic health record data using a variational autoencoder with a custom loss function, which took into account the high missing rate of electronic health data. We showed that when tree-based methods learn missingness patterns (correlated with adverse events) in electronic health record data, this leads to decreased performance if the system is used in a new setting that has different missingness patterns. Performance is worst in this scenario when the missing rate between those with and without an adverse event is the greatest. We found that randomized and Bayesian regression imputation methods mitigate the issue of learning the missingness pattern for tree-based methods. We used this information to build a novel early warning system for predicting patient deterioration in general wards and telemetry units: PICTURE (Predicting Intensive Care Transfers and other UnfoReseen Events). To develop, tune, and test PICTURE, we used labs and vital signs from electronic health records of adult patients over four years (n = 133,089 encounters). We analyzed primary outcomes of unplanned intensive care unit transfer, emergency vasoactive medication administration, cardiac arrest, and death. We compared PICTURE with existing early warning systems and logistic regression at multiple levels of granularity. When analyzing PICTURE on the testing set using all observations within a hospital encounter (event rate = 3.4%), PICTURE had an area under the receiver operating characteristic curve (AUROC) of 0.83 and an adjusted (event rate = 4%) area under the precision-recall curve (AUPR) of 0.27, while the next best tested method—regularized logistic regression—had an AUROC of 0.80 and an adjusted AUPR of 0.22. To ensure system interpretability, we applied a st
doi_str_mv	10.1016/j.jbi.2020.103528
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2434472455</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046420301568</els_id><sourcerecordid>2434472455</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-6809e932dbf6c1ad4611da3becfaf79ce8de16a4406749def38466b635e3697a3</originalsourceid><addsrcrecordid>eNp9UcuO1DAQtBCIXRY-gAvykcsMdvxIAqfV8pRW4gJny7E7jEeJPbidWc2P8L04mmGPnLpLVV1SdRHymrMtZ1y_22_3Q9g2rFmxUE33hFxzJZoNkx17-rhreUVeIO4Z41wp_ZxciabtlWL6mvz5CHOKWLItIf6iZQfUVQy_F4gOkKaRTmBzXMk5INYZAZEebCmQI9IQaeWnE324qPCEBWakY8r0kOEIsVTvI9Ad2KnsqLMZ3tNbGtMRJophXqbKp0ht9BTTtKzgJXk22gnh1WXekJ-fP_24-7q5__7l293t_caJXpeN7lgPvWj8MGrHrZeac2_FAG60Y9s76DxwbaVkupW9h1F0UutBCwVC960VN-Tt2feQU42MxdSQDqbJRkgLmkYKKdtGKlWl_Cx1OSFmGM0hh9nmk-HMrHWYval1mLUOc66j3ry52C_DDP7x4t__q-DDWQA15DFANujC-nkfMrhifAr_sf8LF1KfFw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2434472455</pqid></control><display><type>article</type><title>Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution</title><source>Elsevier ScienceDirect Journals Complete</source><source>EZB Electronic Journals Library</source><creator>Gillies, Christopher E. ; Taylor, Daniel F. ; Cummings, Brandon C. ; Ansari, Sardar ; Islim, Fadi ; Kronick, Steven L. ; Medlin, Richard P. ; Ward, Kevin R.</creator><creatorcontrib>Gillies, Christopher E. ; Taylor, Daniel F. ; Cummings, Brandon C. ; Ansari, Sardar ; Islim, Fadi ; Kronick, Steven L. ; Medlin, Richard P. ; Ward, Kevin R.</creatorcontrib><description>[Display omitted] •Early warning system performance worsens if missingness pattern changes in EHR data.•Generated synthetic EHR data with variational autoencoder and custom loss function.•Randomized and Bayesian regression imputation appropriate for tree-based methods.•Using proper imputation, we developed PICTURE to predict patient deterioration.•PICTURE performance is comparable to current systems and it can explain predictions. When using tree-based methods to develop predictive analytics and early warning systems for preventive healthcare, it is important to use an appropriate imputation method to prevent learning the missingness pattern. To demonstrate this, we developed a novel simulation that generated synthetic electronic health record data using a variational autoencoder with a custom loss function, which took into account the high missing rate of electronic health data. We showed that when tree-based methods learn missingness patterns (correlated with adverse events) in electronic health record data, this leads to decreased performance if the system is used in a new setting that has different missingness patterns. Performance is worst in this scenario when the missing rate between those with and without an adverse event is the greatest. We found that randomized and Bayesian regression imputation methods mitigate the issue of learning the missingness pattern for tree-based methods. We used this information to build a novel early warning system for predicting patient deterioration in general wards and telemetry units: PICTURE (Predicting Intensive Care Transfers and other UnfoReseen Events). To develop, tune, and test PICTURE, we used labs and vital signs from electronic health records of adult patients over four years (n = 133,089 encounters). We analyzed primary outcomes of unplanned intensive care unit transfer, emergency vasoactive medication administration, cardiac arrest, and death. We compared PICTURE with existing early warning systems and logistic regression at multiple levels of granularity. When analyzing PICTURE on the testing set using all observations within a hospital encounter (event rate = 3.4%), PICTURE had an area under the receiver operating characteristic curve (AUROC) of 0.83 and an adjusted (event rate = 4%) area under the precision-recall curve (AUPR) of 0.27, while the next best tested method—regularized logistic regression—had an AUROC of 0.80 and an adjusted AUPR of 0.22. To ensure system interpretability, we applied a state-of-the-art prediction explainer that provided a ranked list of features contributing most to the prediction. Though it is currently difficult to compare machine learning–based early warning systems, a rudimentary comparison with published scores demonstrated that PICTURE is on par with state-of-the-art machine learning systems. To facilitate more robust comparisons and development of early warning systems in the future, we have released our variational autoencoder’s code and weights so researchers can (a) test their models on data similar to our institution and (b) make their own synthetic datasets.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2020.103528</identifier><identifier>PMID: 32795506</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Early warning systems ; Machine learning ; Patient deterioration ; Simulation ; Tree-based methods ; Variational autoencoder</subject><ispartof>Journal of biomedical informatics, 2020-10, Vol.110, p.103528-103528, Article 103528</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright © 2020 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-6809e932dbf6c1ad4611da3becfaf79ce8de16a4406749def38466b635e3697a3</citedby><cites>FETCH-LOGICAL-c396t-6809e932dbf6c1ad4611da3becfaf79ce8de16a4406749def38466b635e3697a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jbi.2020.103528$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32795506$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gillies, Christopher E.</creatorcontrib><creatorcontrib>Taylor, Daniel F.</creatorcontrib><creatorcontrib>Cummings, Brandon C.</creatorcontrib><creatorcontrib>Ansari, Sardar</creatorcontrib><creatorcontrib>Islim, Fadi</creatorcontrib><creatorcontrib>Kronick, Steven L.</creatorcontrib><creatorcontrib>Medlin, Richard P.</creatorcontrib><creatorcontrib>Ward, Kevin R.</creatorcontrib><title>Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>[Display omitted] •Early warning system performance worsens if missingness pattern changes in EHR data.•Generated synthetic EHR data with variational autoencoder and custom loss function.•Randomized and Bayesian regression imputation appropriate for tree-based methods.•Using proper imputation, we developed PICTURE to predict patient deterioration.•PICTURE performance is comparable to current systems and it can explain predictions. When using tree-based methods to develop predictive analytics and early warning systems for preventive healthcare, it is important to use an appropriate imputation method to prevent learning the missingness pattern. To demonstrate this, we developed a novel simulation that generated synthetic electronic health record data using a variational autoencoder with a custom loss function, which took into account the high missing rate of electronic health data. We showed that when tree-based methods learn missingness patterns (correlated with adverse events) in electronic health record data, this leads to decreased performance if the system is used in a new setting that has different missingness patterns. Performance is worst in this scenario when the missing rate between those with and without an adverse event is the greatest. We found that randomized and Bayesian regression imputation methods mitigate the issue of learning the missingness pattern for tree-based methods. We used this information to build a novel early warning system for predicting patient deterioration in general wards and telemetry units: PICTURE (Predicting Intensive Care Transfers and other UnfoReseen Events). To develop, tune, and test PICTURE, we used labs and vital signs from electronic health records of adult patients over four years (n = 133,089 encounters). We analyzed primary outcomes of unplanned intensive care unit transfer, emergency vasoactive medication administration, cardiac arrest, and death. We compared PICTURE with existing early warning systems and logistic regression at multiple levels of granularity. When analyzing PICTURE on the testing set using all observations within a hospital encounter (event rate = 3.4%), PICTURE had an area under the receiver operating characteristic curve (AUROC) of 0.83 and an adjusted (event rate = 4%) area under the precision-recall curve (AUPR) of 0.27, while the next best tested method—regularized logistic regression—had an AUROC of 0.80 and an adjusted AUPR of 0.22. To ensure system interpretability, we applied a state-of-the-art prediction explainer that provided a ranked list of features contributing most to the prediction. Though it is currently difficult to compare machine learning–based early warning systems, a rudimentary comparison with published scores demonstrated that PICTURE is on par with state-of-the-art machine learning systems. To facilitate more robust comparisons and development of early warning systems in the future, we have released our variational autoencoder’s code and weights so researchers can (a) test their models on data similar to our institution and (b) make their own synthetic datasets.</description><subject>Early warning systems</subject><subject>Machine learning</subject><subject>Patient deterioration</subject><subject>Simulation</subject><subject>Tree-based methods</subject><subject>Variational autoencoder</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9UcuO1DAQtBCIXRY-gAvykcsMdvxIAqfV8pRW4gJny7E7jEeJPbidWc2P8L04mmGPnLpLVV1SdRHymrMtZ1y_22_3Q9g2rFmxUE33hFxzJZoNkx17-rhreUVeIO4Z41wp_ZxciabtlWL6mvz5CHOKWLItIf6iZQfUVQy_F4gOkKaRTmBzXMk5INYZAZEebCmQI9IQaeWnE324qPCEBWakY8r0kOEIsVTvI9Ad2KnsqLMZ3tNbGtMRJophXqbKp0ht9BTTtKzgJXk22gnh1WXekJ-fP_24-7q5__7l293t_caJXpeN7lgPvWj8MGrHrZeac2_FAG60Y9s76DxwbaVkupW9h1F0UutBCwVC960VN-Tt2feQU42MxdSQDqbJRkgLmkYKKdtGKlWl_Cx1OSFmGM0hh9nmk-HMrHWYval1mLUOc66j3ry52C_DDP7x4t__q-DDWQA15DFANujC-nkfMrhifAr_sf8LF1KfFw</recordid><startdate>202010</startdate><enddate>202010</enddate><creator>Gillies, Christopher E.</creator><creator>Taylor, Daniel F.</creator><creator>Cummings, Brandon C.</creator><creator>Ansari, Sardar</creator><creator>Islim, Fadi</creator><creator>Kronick, Steven L.</creator><creator>Medlin, Richard P.</creator><creator>Ward, Kevin R.</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>202010</creationdate><title>Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution</title><author>Gillies, Christopher E. ; Taylor, Daniel F. ; Cummings, Brandon C. ; Ansari, Sardar ; Islim, Fadi ; Kronick, Steven L. ; Medlin, Richard P. ; Ward, Kevin R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-6809e932dbf6c1ad4611da3becfaf79ce8de16a4406749def38466b635e3697a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Early warning systems</topic><topic>Machine learning</topic><topic>Patient deterioration</topic><topic>Simulation</topic><topic>Tree-based methods</topic><topic>Variational autoencoder</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gillies, Christopher E.</creatorcontrib><creatorcontrib>Taylor, Daniel F.</creatorcontrib><creatorcontrib>Cummings, Brandon C.</creatorcontrib><creatorcontrib>Ansari, Sardar</creatorcontrib><creatorcontrib>Islim, Fadi</creatorcontrib><creatorcontrib>Kronick, Steven L.</creatorcontrib><creatorcontrib>Medlin, Richard P.</creatorcontrib><creatorcontrib>Ward, Kevin R.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gillies, Christopher E.</au><au>Taylor, Daniel F.</au><au>Cummings, Brandon C.</au><au>Ansari, Sardar</au><au>Islim, Fadi</au><au>Kronick, Steven L.</au><au>Medlin, Richard P.</au><au>Ward, Kevin R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2020-10</date><risdate>2020</risdate><volume>110</volume><spage>103528</spage><epage>103528</epage><pages>103528-103528</pages><artnum>103528</artnum><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>[Display omitted] •Early warning system performance worsens if missingness pattern changes in EHR data.•Generated synthetic EHR data with variational autoencoder and custom loss function.•Randomized and Bayesian regression imputation appropriate for tree-based methods.•Using proper imputation, we developed PICTURE to predict patient deterioration.•PICTURE performance is comparable to current systems and it can explain predictions. When using tree-based methods to develop predictive analytics and early warning systems for preventive healthcare, it is important to use an appropriate imputation method to prevent learning the missingness pattern. To demonstrate this, we developed a novel simulation that generated synthetic electronic health record data using a variational autoencoder with a custom loss function, which took into account the high missing rate of electronic health data. We showed that when tree-based methods learn missingness patterns (correlated with adverse events) in electronic health record data, this leads to decreased performance if the system is used in a new setting that has different missingness patterns. Performance is worst in this scenario when the missing rate between those with and without an adverse event is the greatest. We found that randomized and Bayesian regression imputation methods mitigate the issue of learning the missingness pattern for tree-based methods. We used this information to build a novel early warning system for predicting patient deterioration in general wards and telemetry units: PICTURE (Predicting Intensive Care Transfers and other UnfoReseen Events). To develop, tune, and test PICTURE, we used labs and vital signs from electronic health records of adult patients over four years (n = 133,089 encounters). We analyzed primary outcomes of unplanned intensive care unit transfer, emergency vasoactive medication administration, cardiac arrest, and death. We compared PICTURE with existing early warning systems and logistic regression at multiple levels of granularity. When analyzing PICTURE on the testing set using all observations within a hospital encounter (event rate = 3.4%), PICTURE had an area under the receiver operating characteristic curve (AUROC) of 0.83 and an adjusted (event rate = 4%) area under the precision-recall curve (AUPR) of 0.27, while the next best tested method—regularized logistic regression—had an AUROC of 0.80 and an adjusted AUPR of 0.22. To ensure system interpretability, we applied a state-of-the-art prediction explainer that provided a ranked list of features contributing most to the prediction. Though it is currently difficult to compare machine learning–based early warning systems, a rudimentary comparison with published scores demonstrated that PICTURE is on par with state-of-the-art machine learning systems. To facilitate more robust comparisons and development of early warning systems in the future, we have released our variational autoencoder’s code and weights so researchers can (a) test their models on data similar to our institution and (b) make their own synthetic datasets.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>32795506</pmid><doi>10.1016/j.jbi.2020.103528</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1532-0464
ispartof	Journal of biomedical informatics, 2020-10, Vol.110, p.103528-103528, Article 103528
issn	1532-0464 1532-0480
language	eng
recordid	cdi_proquest_miscellaneous_2434472455
source	Elsevier ScienceDirect Journals Complete; EZB Electronic Journals Library
subjects	Early warning systems Machine learning Patient deterioration Simulation Tree-based methods Variational autoencoder
title	Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A05%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Demonstrating%20the%20consequences%20of%20learning%20missingness%20patterns%20in%20early%20warning%20systems%20for%20preventative%20health%20care:%20A%20novel%20simulation%20and%20solution&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Gillies,%20Christopher%20E.&rft.date=2020-10&rft.volume=110&rft.spage=103528&rft.epage=103528&rft.pages=103528-103528&rft.artnum=103528&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2020.103528&rft_dat=%3Cproquest_cross%3E2434472455%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2434472455&rft_id=info:pmid/32795506&rft_els_id=S1532046420301568&rfr_iscdi=true