Performance evaluation of machine learning for breast cancer diagnosis: A case study

Introduction: Breast cancer (BC) is one of the most common and aggressive malignancies in women worldwide. It is proven that machine learning (ML) could rapidly and cost-effectively diagnose BC. This study aimed to develop and test predictive models for BC based on women's lifestyle factors usi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Informatics in medicine unlocked 2022, Vol.31, p.101009, Article 101009
Hauptverfasser: Shanbehzadeh, Mostafa, Kazemi-Arpanahi, Hadi, Bolbolian Ghalibaf, Mohammad, Orooji, Azam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Introduction: Breast cancer (BC) is one of the most common and aggressive malignancies in women worldwide. It is proven that machine learning (ML) could rapidly and cost-effectively diagnose BC. This study aimed to develop and test predictive models for BC based on women's lifestyle factors using several basic and ensemble machine learning (ML) classifiers. Methods: Data of 1503 suspected BC cases were retrospectively extracted from a hospital-based electronic database. First, important risk factors were identified using wrapper-J48, wrapper-SVM, wrapper-NB, logistic regression (LR), and correlation-based feature selection (CFS) methods. Then the performance of five basic ML algorithms, including Naïve Bayes (NB), Bayesian network (BNeT), random forest (RF), multilayer perceptron (MLP), support vector machine (SVM), C4.5, eXtreme Gradient Boosting (XGBoost), decision tree and two ensemble algorithms, including Confidence weighted voting and Voting were compared to predict BC before and after performing feature section (FS). We utilized SPSS 20 and Weka software version 3.8.4 to analyze the data. Implementation of ML models was also performed in R 3.5.0. Results: The RF algorithm presented the best performance before and after performing FS with AUC of 0.799 and 0.798, respectively. Also, the best model's combination using the Confidence weighted voting method improved the classifier performance and achieved the best result with an 80% AUC. Conclusions: The results showed that ensemble ML algorithms represented higher ability than basic methods. The developed models can accurately classify individuals who are at high risk for BC, and can be employed as a screening tool for the early BC detection.
ISSN:2352-9148
2352-9148
DOI:10.1016/j.imu.2022.101009