Interpretable machine learning model for digital lung cancer prescreening in Chinese populations with missing data

We developed an interpretable model, BOUND (Bayesian netwOrk for large-scale lUng caNcer Digital prescreening), using a comprehensive EHR dataset from the China to improve lung cancer detection rates. BOUND employs Bayesian network uncertainty inference, allowing it to predict lung cancer risk even...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:NPJ digital medicine 2024-11, Vol.7 (1), p.327-14, Article 327
Hauptverfasser: Zhang, Shuaijie, Wang, Qing, Hu, Xifeng, Zhang, Botao, Sun, Shuangshuang, Yuan, Ying, Jia, Xiaofeng, Yu, Yuanyuan, Xue, Fuzhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We developed an interpretable model, BOUND (Bayesian netwOrk for large-scale lUng caNcer Digital prescreening), using a comprehensive EHR dataset from the China to improve lung cancer detection rates. BOUND employs Bayesian network uncertainty inference, allowing it to predict lung cancer risk even with missing data and identify high-risk factors. Developed using data from 905,194 individuals, BOUND achieved an AUC of 0.866 in internal validation, with time- and geography-based external validations yielding AUCs of 0.848 and 0.841, respectively. In datasets with 10%–70% missing data, AUC ranged from 0.827 – 0.746. The model demonstrates strong calibration, clinical utility, and robust performance in both balanced and imbalanced datasets. A risk scorecard was also created, improving detection rates up to 6.8 times, available free online ( https://drzhang1.aiself.net/ ). BOUND enables non-radiative, cost-effective lung cancer prescreening, excels with missing data, and addresses treatment inequities in resource-limited primary healthcare settings.
ISSN:2398-6352
2398-6352
DOI:10.1038/s41746-024-01309-z