Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably

The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of clinical epidemiology 2021-05, Vol.133, p.43-52
Hauptverfasser: Cowling, Thomas E., Cromwell, David A., Bellot, Alexis, Sharples, Linda D., van der Meulen, Jan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 52
container_issue
container_start_page 43
container_title Journal of clinical epidemiology
container_volume 133
creator Cowling, Thomas E.
Cromwell, David A.
Bellot, Alexis
Sharples, Linda D.
van der Meulen, Jan
description The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.
doi_str_mv 10.1016/j.jclinepi.2020.12.018
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2473415852</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0895435620312221</els_id><sourcerecordid>2529826190</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</originalsourceid><addsrcrecordid>eNqFkc1q3DAUhUVpaaZJXiEIuunGU_3b3rWE9AcGuknX4lq-dmVsy5U0hXn7aJiki266kUB890g6HyF3nO054-bjtJ_c7Ffc_F4wUQ7FnvHmFdnxpm4q3Qr-muxY0-pKSW2uyLuUJsZ4zWr9llxJKXUrebsj8RBGn7J3NOIYMSUfVgprTxdwv0o-nRHi6teRbhF77zL2dIPscc10CTHD7POJDjEsdIY4Ik2YEw0D7T2Ma0g-URd6PK_LBhG6-XRD3gwwJ7x93q_Jzy8Pj_ffqsOPr9_vPx8qp5TKlQRnXPmMEUYhmq5h9QCdQaZVywcOjeo1h6HTAxiujGw7YGhq3YOSzrlWXpMPl9wtht9HTNkuPjmcZ1gxHJMVqpaK60aLgr7_B53CMa7ldVZo0TbC8JYVylwoF0NKEQe7Rb9APFnO7NmKneyLFXu2YrmwxUoZvHuOP3YL9n_HXjQU4NMFwNLHH4_RJlcqdqXxiC7bPvj_3fEE9VejYQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2529826190</pqid></control><display><type>article</type><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><source>Access via ScienceDirect (Elsevier)</source><source>ProQuest Central UK/Ireland</source><creator>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</creator><creatorcontrib>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</creatorcontrib><description>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</description><identifier>ISSN: 0895-4356</identifier><identifier>EISSN: 1878-5921</identifier><identifier>DOI: 10.1016/j.jclinepi.2020.12.018</identifier><identifier>PMID: 33359319</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Big data ; Calibration ; Codes ; Colorectal carcinoma ; Colorectal surgery ; Comorbidity ; Confidence intervals ; Datasets ; Diagnosis ; Electronic health records ; Epidemiology ; Fractures ; Health care ; Health services ; Heart attacks ; Hip ; Hospitals ; International Classification of Diseases ; Learning algorithms ; Machine learning ; Medical diagnosis ; Medical prognosis ; Mortality ; Myocardial infarction ; Patients ; Prognosis ; Regression analysis ; Regression models ; Socioeconomic factors ; Socioeconomics ; Statistical analysis ; Statistics ; Surgery</subject><ispartof>Journal of clinical epidemiology, 2021-05, Vol.133, p.43-52</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright © 2020 Elsevier Inc. All rights reserved.</rights><rights>2020. Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</citedby><cites>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</cites><orcidid>0000-0003-1524-4393 ; 0000-0002-9451-2335</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2529826190?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995,64385,64387,64389,72469</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33359319$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cowling, Thomas E.</creatorcontrib><creatorcontrib>Cromwell, David A.</creatorcontrib><creatorcontrib>Bellot, Alexis</creatorcontrib><creatorcontrib>Sharples, Linda D.</creatorcontrib><creatorcontrib>van der Meulen, Jan</creatorcontrib><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><title>Journal of clinical epidemiology</title><addtitle>J Clin Epidemiol</addtitle><description>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</description><subject>Big data</subject><subject>Calibration</subject><subject>Codes</subject><subject>Colorectal carcinoma</subject><subject>Colorectal surgery</subject><subject>Comorbidity</subject><subject>Confidence intervals</subject><subject>Datasets</subject><subject>Diagnosis</subject><subject>Electronic health records</subject><subject>Epidemiology</subject><subject>Fractures</subject><subject>Health care</subject><subject>Health services</subject><subject>Heart attacks</subject><subject>Hip</subject><subject>Hospitals</subject><subject>International Classification of Diseases</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>Medical diagnosis</subject><subject>Medical prognosis</subject><subject>Mortality</subject><subject>Myocardial infarction</subject><subject>Patients</subject><subject>Prognosis</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Socioeconomic factors</subject><subject>Socioeconomics</subject><subject>Statistical analysis</subject><subject>Statistics</subject><subject>Surgery</subject><issn>0895-4356</issn><issn>1878-5921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkc1q3DAUhUVpaaZJXiEIuunGU_3b3rWE9AcGuknX4lq-dmVsy5U0hXn7aJiki266kUB890g6HyF3nO054-bjtJ_c7Ffc_F4wUQ7FnvHmFdnxpm4q3Qr-muxY0-pKSW2uyLuUJsZ4zWr9llxJKXUrebsj8RBGn7J3NOIYMSUfVgprTxdwv0o-nRHi6teRbhF77zL2dIPscc10CTHD7POJDjEsdIY4Ik2YEw0D7T2Ma0g-URd6PK_LBhG6-XRD3gwwJ7x93q_Jzy8Pj_ffqsOPr9_vPx8qp5TKlQRnXPmMEUYhmq5h9QCdQaZVywcOjeo1h6HTAxiujGw7YGhq3YOSzrlWXpMPl9wtht9HTNkuPjmcZ1gxHJMVqpaK60aLgr7_B53CMa7ldVZo0TbC8JYVylwoF0NKEQe7Rb9APFnO7NmKneyLFXu2YrmwxUoZvHuOP3YL9n_HXjQU4NMFwNLHH4_RJlcqdqXxiC7bPvj_3fEE9VejYQ</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Cowling, Thomas E.</creator><creator>Cromwell, David A.</creator><creator>Bellot, Alexis</creator><creator>Sharples, Linda D.</creator><creator>van der Meulen, Jan</creator><general>Elsevier Inc</general><general>Elsevier Limited</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7RV</scope><scope>7T2</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1524-4393</orcidid><orcidid>https://orcid.org/0000-0002-9451-2335</orcidid></search><sort><creationdate>20210501</creationdate><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><author>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Big data</topic><topic>Calibration</topic><topic>Codes</topic><topic>Colorectal carcinoma</topic><topic>Colorectal surgery</topic><topic>Comorbidity</topic><topic>Confidence intervals</topic><topic>Datasets</topic><topic>Diagnosis</topic><topic>Electronic health records</topic><topic>Epidemiology</topic><topic>Fractures</topic><topic>Health care</topic><topic>Health services</topic><topic>Heart attacks</topic><topic>Hip</topic><topic>Hospitals</topic><topic>International Classification of Diseases</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>Medical diagnosis</topic><topic>Medical prognosis</topic><topic>Mortality</topic><topic>Myocardial infarction</topic><topic>Patients</topic><topic>Prognosis</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Socioeconomic factors</topic><topic>Socioeconomics</topic><topic>Statistical analysis</topic><topic>Statistics</topic><topic>Surgery</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cowling, Thomas E.</creatorcontrib><creatorcontrib>Cromwell, David A.</creatorcontrib><creatorcontrib>Bellot, Alexis</creatorcontrib><creatorcontrib>Sharples, Linda D.</creatorcontrib><creatorcontrib>van der Meulen, Jan</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Proquest Nursing &amp; Allied Health Source</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Research Library (Corporate)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of clinical epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cowling, Thomas E.</au><au>Cromwell, David A.</au><au>Bellot, Alexis</au><au>Sharples, Linda D.</au><au>van der Meulen, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</atitle><jtitle>Journal of clinical epidemiology</jtitle><addtitle>J Clin Epidemiol</addtitle><date>2021-05-01</date><risdate>2021</risdate><volume>133</volume><spage>43</spage><epage>52</epage><pages>43-52</pages><issn>0895-4356</issn><eissn>1878-5921</eissn><abstract>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>33359319</pmid><doi>10.1016/j.jclinepi.2020.12.018</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1524-4393</orcidid><orcidid>https://orcid.org/0000-0002-9451-2335</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0895-4356
ispartof Journal of clinical epidemiology, 2021-05, Vol.133, p.43-52
issn 0895-4356
1878-5921
language eng
recordid cdi_proquest_miscellaneous_2473415852
source Access via ScienceDirect (Elsevier); ProQuest Central UK/Ireland
subjects Big data
Calibration
Codes
Colorectal carcinoma
Colorectal surgery
Comorbidity
Confidence intervals
Datasets
Diagnosis
Electronic health records
Epidemiology
Fractures
Health care
Health services
Heart attacks
Hip
Hospitals
International Classification of Diseases
Learning algorithms
Machine learning
Medical diagnosis
Medical prognosis
Mortality
Myocardial infarction
Patients
Prognosis
Regression analysis
Regression models
Socioeconomic factors
Socioeconomics
Statistical analysis
Statistics
Surgery
title Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A19%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Logistic%20regression%20and%20machine%20learning%20predicted%20patient%20mortality%20from%20large%20sets%20of%20diagnosis%20codes%20comparably&rft.jtitle=Journal%20of%20clinical%20epidemiology&rft.au=Cowling,%20Thomas%20E.&rft.date=2021-05-01&rft.volume=133&rft.spage=43&rft.epage=52&rft.pages=43-52&rft.issn=0895-4356&rft.eissn=1878-5921&rft_id=info:doi/10.1016/j.jclinepi.2020.12.018&rft_dat=%3Cproquest_cross%3E2529826190%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2529826190&rft_id=info:pmid/33359319&rft_els_id=S0895435620312221&rfr_iscdi=true