Automated data extraction and ensemble methods for predictive modeling of breast cancer outcomes after radiation therapy
Purpose The purpose of this study was to compare the effectiveness of ensemble methods (e.g., random forests) and single‐model methods (e.g., logistic regression and decision trees) in predictive modeling of post‐RT treatment failure and adverse events (AEs) for breast cancer patients using automati...
Gespeichert in:
Veröffentlicht in: | Medical physics (Lancaster) 2019-02, Vol.46 (2), p.1054-1063 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Purpose
The purpose of this study was to compare the effectiveness of ensemble methods (e.g., random forests) and single‐model methods (e.g., logistic regression and decision trees) in predictive modeling of post‐RT treatment failure and adverse events (AEs) for breast cancer patients using automatically extracted EMR data.
Methods
Data from 1967 consecutive breast radiotherapy (RT) courses at one institution between 2008 and 2015 were automatically extracted from EMRs and oncology information systems using extraction software. Over 230 variables were extracted spanning the following variable segments: patient demographics, medical/surgical history, tumor characteristics, RT treatment history, and AEs tracked using CTCAEv4.0. Treatment failure was extracted algorithmically by searching posttreatment encounters for evidence of local, nodal, or distant failure. Individual models were trained using decision trees, logistic regression, random forests, and boosted decision trees to predict treatment failures and AEs. Models were fit on 75% of the data and evaluated for probability calibration and area under the ROC curve (AUC) on the remaining test set. The impact of each variable segment was assessed by retraining without the segment and measuring change in AUC (ΔAUC).
Results
All AUC values were statistically significant (P |
---|---|
ISSN: | 0094-2405 2473-4209 |
DOI: | 10.1002/mp.13314 |