Application of decision tree-based ensemble learning in the classification of breast cancer

As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers in biology and medicine 2021-01, Vol.128, p.104089-104089, Article 104089
Hauptverfasser: Ghiasi, Mohammad M., Zendehboudi, Sohrab
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future. •A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.
ISSN:0010-4825
1879-0534
DOI:10.1016/j.compbiomed.2020.104089