Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably
The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. We analyzed national hospital records and official death records for patients with myocardial infar...
Gespeichert in:
Veröffentlicht in: | Journal of clinical epidemiology 2021-05, Vol.133, p.43-52 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 52 |
---|---|
container_issue | |
container_start_page | 43 |
container_title | Journal of clinical epidemiology |
container_volume | 133 |
creator | Cowling, Thomas E. Cromwell, David A. Bellot, Alexis Sharples, Linda D. van der Meulen, Jan |
description | The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.
We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.
One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.
In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably. |
doi_str_mv | 10.1016/j.jclinepi.2020.12.018 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2473415852</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0895435620312221</els_id><sourcerecordid>2529826190</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</originalsourceid><addsrcrecordid>eNqFkc1q3DAUhUVpaaZJXiEIuunGU_3b3rWE9AcGuknX4lq-dmVsy5U0hXn7aJiki266kUB890g6HyF3nO054-bjtJ_c7Ffc_F4wUQ7FnvHmFdnxpm4q3Qr-muxY0-pKSW2uyLuUJsZ4zWr9llxJKXUrebsj8RBGn7J3NOIYMSUfVgprTxdwv0o-nRHi6teRbhF77zL2dIPscc10CTHD7POJDjEsdIY4Ik2YEw0D7T2Ma0g-URd6PK_LBhG6-XRD3gwwJ7x93q_Jzy8Pj_ffqsOPr9_vPx8qp5TKlQRnXPmMEUYhmq5h9QCdQaZVywcOjeo1h6HTAxiujGw7YGhq3YOSzrlWXpMPl9wtht9HTNkuPjmcZ1gxHJMVqpaK60aLgr7_B53CMa7ldVZo0TbC8JYVylwoF0NKEQe7Rb9APFnO7NmKneyLFXu2YrmwxUoZvHuOP3YL9n_HXjQU4NMFwNLHH4_RJlcqdqXxiC7bPvj_3fEE9VejYQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2529826190</pqid></control><display><type>article</type><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><source>Access via ScienceDirect (Elsevier)</source><source>ProQuest Central UK/Ireland</source><creator>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</creator><creatorcontrib>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</creatorcontrib><description>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.
We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.
One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.
In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</description><identifier>ISSN: 0895-4356</identifier><identifier>EISSN: 1878-5921</identifier><identifier>DOI: 10.1016/j.jclinepi.2020.12.018</identifier><identifier>PMID: 33359319</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Big data ; Calibration ; Codes ; Colorectal carcinoma ; Colorectal surgery ; Comorbidity ; Confidence intervals ; Datasets ; Diagnosis ; Electronic health records ; Epidemiology ; Fractures ; Health care ; Health services ; Heart attacks ; Hip ; Hospitals ; International Classification of Diseases ; Learning algorithms ; Machine learning ; Medical diagnosis ; Medical prognosis ; Mortality ; Myocardial infarction ; Patients ; Prognosis ; Regression analysis ; Regression models ; Socioeconomic factors ; Socioeconomics ; Statistical analysis ; Statistics ; Surgery</subject><ispartof>Journal of clinical epidemiology, 2021-05, Vol.133, p.43-52</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright © 2020 Elsevier Inc. All rights reserved.</rights><rights>2020. Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</citedby><cites>FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</cites><orcidid>0000-0003-1524-4393 ; 0000-0002-9451-2335</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2529826190?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995,64385,64387,64389,72469</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33359319$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cowling, Thomas E.</creatorcontrib><creatorcontrib>Cromwell, David A.</creatorcontrib><creatorcontrib>Bellot, Alexis</creatorcontrib><creatorcontrib>Sharples, Linda D.</creatorcontrib><creatorcontrib>van der Meulen, Jan</creatorcontrib><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><title>Journal of clinical epidemiology</title><addtitle>J Clin Epidemiol</addtitle><description>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.
We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.
One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.
In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</description><subject>Big data</subject><subject>Calibration</subject><subject>Codes</subject><subject>Colorectal carcinoma</subject><subject>Colorectal surgery</subject><subject>Comorbidity</subject><subject>Confidence intervals</subject><subject>Datasets</subject><subject>Diagnosis</subject><subject>Electronic health records</subject><subject>Epidemiology</subject><subject>Fractures</subject><subject>Health care</subject><subject>Health services</subject><subject>Heart attacks</subject><subject>Hip</subject><subject>Hospitals</subject><subject>International Classification of Diseases</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>Medical diagnosis</subject><subject>Medical prognosis</subject><subject>Mortality</subject><subject>Myocardial infarction</subject><subject>Patients</subject><subject>Prognosis</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Socioeconomic factors</subject><subject>Socioeconomics</subject><subject>Statistical analysis</subject><subject>Statistics</subject><subject>Surgery</subject><issn>0895-4356</issn><issn>1878-5921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkc1q3DAUhUVpaaZJXiEIuunGU_3b3rWE9AcGuknX4lq-dmVsy5U0hXn7aJiki266kUB890g6HyF3nO054-bjtJ_c7Ffc_F4wUQ7FnvHmFdnxpm4q3Qr-muxY0-pKSW2uyLuUJsZ4zWr9llxJKXUrebsj8RBGn7J3NOIYMSUfVgprTxdwv0o-nRHi6teRbhF77zL2dIPscc10CTHD7POJDjEsdIY4Ik2YEw0D7T2Ma0g-URd6PK_LBhG6-XRD3gwwJ7x93q_Jzy8Pj_ffqsOPr9_vPx8qp5TKlQRnXPmMEUYhmq5h9QCdQaZVywcOjeo1h6HTAxiujGw7YGhq3YOSzrlWXpMPl9wtht9HTNkuPjmcZ1gxHJMVqpaK60aLgr7_B53CMa7ldVZo0TbC8JYVylwoF0NKEQe7Rb9APFnO7NmKneyLFXu2YrmwxUoZvHuOP3YL9n_HXjQU4NMFwNLHH4_RJlcqdqXxiC7bPvj_3fEE9VejYQ</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Cowling, Thomas E.</creator><creator>Cromwell, David A.</creator><creator>Bellot, Alexis</creator><creator>Sharples, Linda D.</creator><creator>van der Meulen, Jan</creator><general>Elsevier Inc</general><general>Elsevier Limited</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7RV</scope><scope>7T2</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1524-4393</orcidid><orcidid>https://orcid.org/0000-0002-9451-2335</orcidid></search><sort><creationdate>20210501</creationdate><title>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</title><author>Cowling, Thomas E. ; Cromwell, David A. ; Bellot, Alexis ; Sharples, Linda D. ; van der Meulen, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-3ac6c8786264ee6b807fab6e05491f1a84d51afb5fa614639ba0e675da43ccc93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Big data</topic><topic>Calibration</topic><topic>Codes</topic><topic>Colorectal carcinoma</topic><topic>Colorectal surgery</topic><topic>Comorbidity</topic><topic>Confidence intervals</topic><topic>Datasets</topic><topic>Diagnosis</topic><topic>Electronic health records</topic><topic>Epidemiology</topic><topic>Fractures</topic><topic>Health care</topic><topic>Health services</topic><topic>Heart attacks</topic><topic>Hip</topic><topic>Hospitals</topic><topic>International Classification of Diseases</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>Medical diagnosis</topic><topic>Medical prognosis</topic><topic>Mortality</topic><topic>Myocardial infarction</topic><topic>Patients</topic><topic>Prognosis</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Socioeconomic factors</topic><topic>Socioeconomics</topic><topic>Statistical analysis</topic><topic>Statistics</topic><topic>Surgery</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cowling, Thomas E.</creatorcontrib><creatorcontrib>Cromwell, David A.</creatorcontrib><creatorcontrib>Bellot, Alexis</creatorcontrib><creatorcontrib>Sharples, Linda D.</creatorcontrib><creatorcontrib>van der Meulen, Jan</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Proquest Nursing & Allied Health Source</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Research Library (Corporate)</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of clinical epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cowling, Thomas E.</au><au>Cromwell, David A.</au><au>Bellot, Alexis</au><au>Sharples, Linda D.</au><au>van der Meulen, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably</atitle><jtitle>Journal of clinical epidemiology</jtitle><addtitle>J Clin Epidemiol</addtitle><date>2021-05-01</date><risdate>2021</risdate><volume>133</volume><spage>43</spage><epage>52</epage><pages>43-52</pages><issn>0895-4356</issn><eissn>1878-5921</eissn><abstract>The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.
We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015–2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.
One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.
In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>33359319</pmid><doi>10.1016/j.jclinepi.2020.12.018</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1524-4393</orcidid><orcidid>https://orcid.org/0000-0002-9451-2335</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0895-4356 |
ispartof | Journal of clinical epidemiology, 2021-05, Vol.133, p.43-52 |
issn | 0895-4356 1878-5921 |
language | eng |
recordid | cdi_proquest_miscellaneous_2473415852 |
source | Access via ScienceDirect (Elsevier); ProQuest Central UK/Ireland |
subjects | Big data Calibration Codes Colorectal carcinoma Colorectal surgery Comorbidity Confidence intervals Datasets Diagnosis Electronic health records Epidemiology Fractures Health care Health services Heart attacks Hip Hospitals International Classification of Diseases Learning algorithms Machine learning Medical diagnosis Medical prognosis Mortality Myocardial infarction Patients Prognosis Regression analysis Regression models Socioeconomic factors Socioeconomics Statistical analysis Statistics Surgery |
title | Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A19%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Logistic%20regression%20and%20machine%20learning%20predicted%20patient%20mortality%20from%20large%20sets%20of%20diagnosis%20codes%20comparably&rft.jtitle=Journal%20of%20clinical%20epidemiology&rft.au=Cowling,%20Thomas%20E.&rft.date=2021-05-01&rft.volume=133&rft.spage=43&rft.epage=52&rft.pages=43-52&rft.issn=0895-4356&rft.eissn=1878-5921&rft_id=info:doi/10.1016/j.jclinepi.2020.12.018&rft_dat=%3Cproquest_cross%3E2529826190%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2529826190&rft_id=info:pmid/33359319&rft_els_id=S0895435620312221&rfr_iscdi=true |