Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably

The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of clinical epidemiology 2021-05, Vol.133, p.43-52
Hauptverfasser:	Cowling, Thomas E., Cromwell, David A., Bellot, Alexis, Sharples, Linda D., van der Meulen, Jan
Format:	Artikel
Sprache:	eng
Schlagworte:	Big data Calibration Codes Colorectal carcinoma Colorectal surgery Comorbidity Confidence intervals Datasets Diagnosis Electronic health records Epidemiology Fractures Health care Health services Heart attacks Hip Hospitals International Classification of Diseases Learning algorithms Machine learning Medical diagnosis Medical prognosis Mortality Myocardial infarction Patients Prognosis Regression analysis Regression models Socioeconomic factors Socioeconomics Statistical analysis Statistics Surgery
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	52
container_issue
container_start_page	43
container_title	Journal of clinical epidemiology
container_volume	133
creator	Cowling, Thomas E. Cromwell, David A. Bellot, Alexis Sharples, Linda D. van der Meulen, Jan
description	The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.
doi_str_mv	10.1016/j.jclinepi.2020.12.018
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2473415852</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0895435620312221</els_id><sourcerecordid>2529826190</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</originalsourceid><addsrcrecordid>eNqFkc1q3DAUhUVpaaZJXiEIuunGU_3b3rWE9AcGuknX4lq-dmVsy5U0hXn7aJiki266kUB890g6HyF3nO054-bjtJ_c7Ffc_F4wUQ7FnvHmFdnxpm4q3Qr-muxY0-pKSW2uyLuUJsZ4zWr9llxJKXUrebsj8RBGn7J3NOIYMSUfVgprTxdwv0o-nRHi6teRbhF77zL2dIPscc10CTHD7POJDjEsdIY4Ik2YEw0D7T2Ma0g-URd6PK_LBhG6-XRD3gwwJ7x93q_Jzy8Pj_ffqsOPr9_vPx8qp5TKlQRnXPmMEUYhmq5h9QCdQaZVywcOjeo1h6HTAxiujGw7YGhq3YOSzrlWXpMPl9wtht9HTNkuPjmcZ1gxHJMVqpaK60aLgr7_B53CMa7ldVZo0TbC8JYVylwoF0NKEQe7Rb9APFnO7NmKneyLFXu2YrmwxUoZvHuOP3YL9n_HXjQU4NMFwNLHH4_RJlcqdqXxiC7bPvj_3fEE9VejYQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2529826190</pqid></control><display><type>article</type><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><source>Access via ScienceDirect (Elsevier)</source><source>ProQuest Central UK/Ireland</source><creator>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</creator><creatorcontrib>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</creatorcontrib><description>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</description><identifier>ISSN: 0895-4356</identifier><identifier>EISSN: 1878-5921</identifier><identifier>DOI: 10.1016/j.jclinepi.2020.12.018</identifier><identifier>PMID: 33359319</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Big data ; Calibration ; Codes ; Colorectal carcinoma ; Colorectal surgery ; Comorbidity ; Confidence intervals ; Datasets ; Diagnosis ; Electronic health records ; Epidemiology ; Fractures ; Health care ; Health services ; Heart attacks ; Hip ; Hospitals ; International Classification of Diseases ; Learning algorithms ; Machine learning ; Medical diagnosis ; Medical prognosis ; Mortality ; Myocardial infarction ; Patients ; Prognosis ; Regression analysis ; Regression models ; Socioeconomic factors ; Socioeconomics ; Statistical analysis ; Statistics ; Surgery</subject><ispartof>Journal of clinical epidemiology, 2021-05, Vol.133, p.43-52</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright © 2020 Elsevier Inc. All rights reserved.</rights><rights>2020. Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</citedby><cites>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</cites><orcidid>0000-0003-1524-4393 ; 0000-0002-9451-2335</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2529826190?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995,64385,64387,64389,72469</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33359319$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cowling, Thomas E.</creatorcontrib><creatorcontrib>Cromwell, David A.</creatorcontrib><creatorcontrib>Bellot, Alexis</creatorcontrib><creatorcontrib>Sharples, Linda D.</creatorcontrib><creatorcontrib>van der Meulen, Jan</creatorcontrib><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><title>Journal of clinical epidemiology</title><addtitle>J Clin Epidemiol</addtitle><description>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</description><subject>Big data</subject><subject>Calibration</subject><subject>Codes</subject><subject>Colorectal carcinoma</subject><subject>Colorectal surgery</subject><subject>Comorbidity</subject><subject>Confidence intervals</subject><subject>Datasets</subject><subject>Diagnosis</subject><subject>Electronic health records</subject><subject>Epidemiology</subject><subject>Fractures</subject><subject>Health care</subject><subject>Health services</subject><subject>Heart attacks</subject><subject>Hip</subject><subject>Hospitals</subject><subject>International Classification of Diseases</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>Medical diagnosis</subject><subject>Medical prognosis</subject><subject>Mortality</subject><subject>Myocardial infarction</subject><subject>Patients</subject><subject>Prognosis</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Socioeconomic factors</subject><subject>Socioeconomics</subject><subject>Statistical analysis</subject><subject>Statistics</subject><subject>Surgery</subject><issn>0895-4356</issn><issn>1878-5921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkc1q3DAUhUVpaaZJXiEIuunGU_3b3rWE9AcGuknX4lq-dmVsy5U0hXn7aJiki266kUB890g6HyF3nO054-bjtJ_c7Ffc_F4wUQ7FnvHmFdnxpm4q3Qr-muxY0-pKSW2uyLuUJsZ4zWr9llxJKXUrebsj8RBGn7J3NOIYMSUfVgprTxdwv0o-nRHi6teRbhF77zL2dIPscc10CTHD7POJDjEsdIY4Ik2YEw0D7T2Ma0g-URd6PK_LBhG6-XRD3gwwJ7x93q_Jzy8Pj_ffqsOPr9_vPx8qp5TKlQRnXPmMEUYhmq5h9QCdQaZVywcOjeo1h6HTAxiujGw7YGhq3YOSzrlWXpMPl9wtht9HTNkuPjmcZ1gxHJMVqpaK60aLgr7_B53CMa7ldVZo0TbC8JYVylwoF0NKEQe7Rb9APFnO7NmKneyLFXu2YrmwxUoZvHuOP3YL9n_HXjQU4NMFwNLHH4_RJlcqdqXxiC7bPvj_3fEE9VejYQ</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Cowling, Thomas E.</creator><creator>Cromwell, David A.</creator><creator>Bellot, Alexis</creator><creator>Sharples, Linda D.</creator><creator>van der Meulen, Jan</creator><general>Elsevier Inc</general><general>Elsevier Limited</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7RV</scope><scope>7T2</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1524-4393</orcidid><orcidid>https://orcid.org/0000-0002-9451-2335</orcidid></search><sort><creationdate>20210501</creationdate><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><author>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Big data</topic><topic>Calibration</topic><topic>Codes</topic><topic>Colorectal carcinoma</topic><topic>Colorectal surgery</topic><topic>Comorbidity</topic><topic>Confidence intervals</topic><topic>Datasets</topic><topic>Diagnosis</topic><topic>Electronic health records</topic><topic>Epidemiology</topic><topic>Fractures</topic><topic>Health care</topic><topic>Health services</topic><topic>Heart attacks</topic><topic>Hip</topic><topic>Hospitals</topic><topic>International Classification of Diseases</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>Medical diagnosis</topic><topic>Medical prognosis</topic><topic>Mortality</topic><topic>Myocardial infarction</topic><topic>Patients</topic><topic>Prognosis</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Socioeconomic factors</topic><topic>Socioeconomics</topic><topic>Statistical analysis</topic><topic>Statistics</topic><topic>Surgery</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cowling, Thomas E.</creatorcontrib><creatorcontrib>Cromwell, David A.</creatorcontrib><creatorcontrib>Bellot, Alexis</creatorcontrib><creatorcontrib>Sharples, Linda D.</creatorcontrib><creatorcontrib>van der Meulen, Jan</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Proquest Nursing & Allied Health Source</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Research Library (Corporate)</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of clinical epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cowling, Thomas E.</au><au>Cromwell, David A.</au><au>Bellot, Alexis</au><au>Sharples, Linda D.</au><au>van der Meulen, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</atitle><jtitle>Journal of clinical epidemiology</jtitle><addtitle>J Clin Epidemiol</addtitle><date>2021-05-01</date><risdate>2021</risdate><volume>133</volume><spage>43</spage><epage>52</epage><pages>43-52</pages><issn>0895-4356</issn><eissn>1878-5921</eissn><abstract>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>33359319</pmid><doi>10.1016/j.jclinepi.2020.12.018</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1524-4393</orcidid><orcidid>https://orcid.org/0000-0002-9451-2335</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0895-4356
ispartof	Journal of clinical epidemiology, 2021-05, Vol.133, p.43-52
issn	0895-4356 1878-5921
language	eng
recordid	cdi_proquest_miscellaneous_2473415852
source	Access via ScienceDirect (Elsevier); ProQuest Central UK/Ireland
subjects	Big data Calibration Codes Colorectal carcinoma Colorectal surgery Comorbidity Confidence intervals Datasets Diagnosis Electronic health records Epidemiology Fractures Health care Health services Heart attacks Hip Hospitals International Classification of Diseases Learning algorithms Machine learning Medical diagnosis Medical prognosis Mortality Myocardial infarction Patients Prognosis Regression analysis Regression models Socioeconomic factors Socioeconomics Statistical analysis Statistics Surgery
title	Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A19%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Logistic%20regression%20and%20machine%20learning%20predicted%20patient%20mortality%20from%20large%20sets%20of%20diagnosis%20codes%20comparably&rft.jtitle=Journal%20of%20clinical%20epidemiology&rft.au=Cowling,%20Thomas%20E.&rft.date=2021-05-01&rft.volume=133&rft.spage=43&rft.epage=52&rft.pages=43-52&rft.issn=0895-4356&rft.eissn=1878-5921&rft_id=info:doi/10.1016/j.jclinepi.2020.12.018&rft_dat=%3Cproquest_cross%3E2529826190%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2529826190&rft_id=info:pmid/33359319&rft_els_id=S0895435620312221&rfr_iscdi=true