Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction

IMPORTANCE: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications. OBJECTIVE: To evaluate risk-prediction model performance when trained on ri...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Archives of surgery (Chicago. 1960) 2024-12, Vol.159 (12), p.1424-1431
Hauptverfasser: Balch, Jeremy A, Ruppert, Matthew M, Guan, Ziyuan, Buchanan, Timothy R, Abbott, Kenneth L, Shickel, Benjamin, Bihorac, Azra, Liang, Muxuan, Upchurch, Gilbert R, Tignanelli, Christopher J, Loftus, Tyler J
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:IMPORTANCE: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications. OBJECTIVE: To evaluate risk-prediction model performance when trained on risk-specific cohorts. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined. EXPOSURES: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively. MAIN OUTCOMES AND MEASURES: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model. RESULTS: A total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and
ISSN:2168-6254
2168-6262
2168-6262
DOI:10.1001/jamasurg.2024.4299