Application of decision tree-based ensemble learning in the classification of breast cancer

As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers in biology and medicine 2021-01, Vol.128, p.104089-104089, Article 104089
Hauptverfasser: Ghiasi, Mohammad M., Zendehboudi, Sohrab
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 104089
container_issue
container_start_page 104089
container_title Computers in biology and medicine
container_volume 128
creator Ghiasi, Mohammad M.
Zendehboudi, Sohrab
description As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future. •A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.
doi_str_mv 10.1016/j.compbiomed.2020.104089
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2471461990</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0010482520304200</els_id><sourcerecordid>2472657300</sourcerecordid><originalsourceid>FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</originalsourceid><addsrcrecordid>eNqFkUtLxDAUhYMozvj4C1Jw46bjTZs26XIcfIHgRlcuQpLeaoa2qUlH8N-bMg6KG7NJyP1OTjiHkITCggItL9cL47pBW9dhvcggm64ZiGqPzKngVQpFzvbJHIBCykRWzMhRCGuACOVwSGZ5XKIS2Zy8LIehtUaN1vWJa5IajQ3TefSIqVYB6wT7gJ1uMWlR-d72r4mN8zdMTKtCsM0vufaowpgY1Rv0J-SgUW3A0-_9mDzfXD-t7tKHx9v71fIhNawUY1pR0zAETYWmpckKDlneaMo4UFY0ORVQZWXBK065YhSZ5iWtRFkyAznyosiPycX23cG79w2GUXY2GGxb1aPbBJkxTlnUVBDR8z_o2m18H383UZNNDhMltpTxLgSPjRy87ZT_lBTkVIBcy58C5FSA3BYQpWffBhs9zXbCXeIRuNoCGBP5sOhlMBZjXLX1aEZZO_u_yxfSGZnH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2472657300</pqid></control><display><type>article</type><title>Application of decision tree-based ensemble learning in the classification of breast cancer</title><source>Elsevier ScienceDirect Journals</source><source>ProQuest Central UK/Ireland</source><creator>Ghiasi, Mohammad M. ; Zendehboudi, Sohrab</creator><creatorcontrib>Ghiasi, Mohammad M. ; Zendehboudi, Sohrab</creatorcontrib><description>As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future. •A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.</description><identifier>ISSN: 0010-4825</identifier><identifier>EISSN: 1879-0534</identifier><identifier>DOI: 10.1016/j.compbiomed.2020.104089</identifier><identifier>PMID: 33338982</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Accuracy ; Algorithms ; Back propagation ; Biopsy ; Breast cancer ; Cancer therapies ; Cell size ; Chromatin ; Classification ; Cytology ; Data mining ; Decision analysis ; Decision trees ; Diagnostic software ; Diagnostic systems ; Ensemble learning ; Error analysis ; Fuzzy sets ; Investigations ; Learning algorithms ; Linear programming ; Machine learning ; Mammography ; Methods ; Mitosis ; Neural networks ; Random forest/extra trees ; Screening ; Statistical analysis ; Support vector machines ; Tumors ; Wisconsin breast cancer database</subject><ispartof>Computers in biology and medicine, 2021-01, Vol.128, p.104089-104089, Article 104089</ispartof><rights>2020 Elsevier Ltd</rights><rights>Copyright © 2020 Elsevier Ltd. All rights reserved.</rights><rights>2020. Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</citedby><cites>FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2472657300?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976,64364,64366,64368,72218</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33338982$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ghiasi, Mohammad M.</creatorcontrib><creatorcontrib>Zendehboudi, Sohrab</creatorcontrib><title>Application of decision tree-based ensemble learning in the classification of breast cancer</title><title>Computers in biology and medicine</title><addtitle>Comput Biol Med</addtitle><description>As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future. •A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Back propagation</subject><subject>Biopsy</subject><subject>Breast cancer</subject><subject>Cancer therapies</subject><subject>Cell size</subject><subject>Chromatin</subject><subject>Classification</subject><subject>Cytology</subject><subject>Data mining</subject><subject>Decision analysis</subject><subject>Decision trees</subject><subject>Diagnostic software</subject><subject>Diagnostic systems</subject><subject>Ensemble learning</subject><subject>Error analysis</subject><subject>Fuzzy sets</subject><subject>Investigations</subject><subject>Learning algorithms</subject><subject>Linear programming</subject><subject>Machine learning</subject><subject>Mammography</subject><subject>Methods</subject><subject>Mitosis</subject><subject>Neural networks</subject><subject>Random forest/extra trees</subject><subject>Screening</subject><subject>Statistical analysis</subject><subject>Support vector machines</subject><subject>Tumors</subject><subject>Wisconsin breast cancer database</subject><issn>0010-4825</issn><issn>1879-0534</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkUtLxDAUhYMozvj4C1Jw46bjTZs26XIcfIHgRlcuQpLeaoa2qUlH8N-bMg6KG7NJyP1OTjiHkITCggItL9cL47pBW9dhvcggm64ZiGqPzKngVQpFzvbJHIBCykRWzMhRCGuACOVwSGZ5XKIS2Zy8LIehtUaN1vWJa5IajQ3TefSIqVYB6wT7gJ1uMWlR-d72r4mN8zdMTKtCsM0vufaowpgY1Rv0J-SgUW3A0-_9mDzfXD-t7tKHx9v71fIhNawUY1pR0zAETYWmpckKDlneaMo4UFY0ORVQZWXBK065YhSZ5iWtRFkyAznyosiPycX23cG79w2GUXY2GGxb1aPbBJkxTlnUVBDR8z_o2m18H383UZNNDhMltpTxLgSPjRy87ZT_lBTkVIBcy58C5FSA3BYQpWffBhs9zXbCXeIRuNoCGBP5sOhlMBZjXLX1aEZZO_u_yxfSGZnH</recordid><startdate>202101</startdate><enddate>202101</enddate><creator>Ghiasi, Mohammad M.</creator><creator>Zendehboudi, Sohrab</creator><general>Elsevier Ltd</general><general>Elsevier Limited</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7RV</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>KB0</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M7P</scope><scope>M7Z</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>202101</creationdate><title>Application of decision tree-based ensemble learning in the classification of breast cancer</title><author>Ghiasi, Mohammad M. ; Zendehboudi, Sohrab</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Back propagation</topic><topic>Biopsy</topic><topic>Breast cancer</topic><topic>Cancer therapies</topic><topic>Cell size</topic><topic>Chromatin</topic><topic>Classification</topic><topic>Cytology</topic><topic>Data mining</topic><topic>Decision analysis</topic><topic>Decision trees</topic><topic>Diagnostic software</topic><topic>Diagnostic systems</topic><topic>Ensemble learning</topic><topic>Error analysis</topic><topic>Fuzzy sets</topic><topic>Investigations</topic><topic>Learning algorithms</topic><topic>Linear programming</topic><topic>Machine learning</topic><topic>Mammography</topic><topic>Methods</topic><topic>Mitosis</topic><topic>Neural networks</topic><topic>Random forest/extra trees</topic><topic>Screening</topic><topic>Statistical analysis</topic><topic>Support vector machines</topic><topic>Tumors</topic><topic>Wisconsin breast cancer database</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ghiasi, Mohammad M.</creatorcontrib><creatorcontrib>Zendehboudi, Sohrab</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Biological Science Database</collection><collection>Biochemistry Abstracts 1</collection><collection>Research Library (Corporate)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Computers in biology and medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ghiasi, Mohammad M.</au><au>Zendehboudi, Sohrab</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Application of decision tree-based ensemble learning in the classification of breast cancer</atitle><jtitle>Computers in biology and medicine</jtitle><addtitle>Comput Biol Med</addtitle><date>2021-01</date><risdate>2021</risdate><volume>128</volume><spage>104089</spage><epage>104089</epage><pages>104089-104089</pages><artnum>104089</artnum><issn>0010-4825</issn><eissn>1879-0534</eissn><abstract>As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future. •A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>33338982</pmid><doi>10.1016/j.compbiomed.2020.104089</doi><tpages>1</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0010-4825
ispartof Computers in biology and medicine, 2021-01, Vol.128, p.104089-104089, Article 104089
issn 0010-4825
1879-0534
language eng
recordid cdi_proquest_miscellaneous_2471461990
source Elsevier ScienceDirect Journals; ProQuest Central UK/Ireland
subjects Accuracy
Algorithms
Back propagation
Biopsy
Breast cancer
Cancer therapies
Cell size
Chromatin
Classification
Cytology
Data mining
Decision analysis
Decision trees
Diagnostic software
Diagnostic systems
Ensemble learning
Error analysis
Fuzzy sets
Investigations
Learning algorithms
Linear programming
Machine learning
Mammography
Methods
Mitosis
Neural networks
Random forest/extra trees
Screening
Statistical analysis
Support vector machines
Tumors
Wisconsin breast cancer database
title Application of decision tree-based ensemble learning in the classification of breast cancer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T12%3A04%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Application%20of%20decision%20tree-based%20ensemble%20learning%20in%20the%20classification%20of%20breast%20cancer&rft.jtitle=Computers%20in%20biology%20and%20medicine&rft.au=Ghiasi,%20Mohammad%20M.&rft.date=2021-01&rft.volume=128&rft.spage=104089&rft.epage=104089&rft.pages=104089-104089&rft.artnum=104089&rft.issn=0010-4825&rft.eissn=1879-0534&rft_id=info:doi/10.1016/j.compbiomed.2020.104089&rft_dat=%3Cproquest_cross%3E2472657300%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2472657300&rft_id=info:pmid/33338982&rft_els_id=S0010482520304200&rfr_iscdi=true