Survival Prediction in Lung Cancer Treated with Radiotherapy: Bayesian Networks vs. Support Vector Machines in Handling Missing Data

Missing data is a given in the medical domain, so machine learning models should have satisfactory performance even when missing data occurs. Our previous work has focused on support vector machines (SVM), but we hypothesize that Bayesian networks (BN) can handle missing data better. To test the hyp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dekker, A., Dehing-Oberije, C., De Ruysscher, D., Lambin, P., Hope, A., Komati, K., Fung, G., Shipeng Yu, De Neve, W., Lievens, Y.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Missing data is a given in the medical domain, so machine learning models should have satisfactory performance even when missing data occurs. Our previous work has focused on support vector machines (SVM), but we hypothesize that Bayesian networks (BN) can handle missing data better. To test the hypothesis, we trained a BN and SVM model for 2 year survival on 322 lung cancer patients and compared their performance in three separate external datasets (35, 47, 33 patients), each with their own characteristics in terms of missing data. The models used tumor size, clinical T and N stage, involved lymph nodes and WHO performance as prognostic features. We found that the BN model performed better than SVM (AUC 0.77, 0.72. 0.70 vs. 0.71, 0.68, 0.69), especially if tumor size was missing. We conclude that BN models are better suited for the medical domain, as they can handle missing data better.
DOI:10.1109/ICMLA.2009.92