Application of decision tree-based ensemble learning in the classification of breast cancer
As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and...
Gespeichert in:
Veröffentlicht in: | Computers in biology and medicine 2021-01, Vol.128, p.104089-104089, Article 104089 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 104089 |
---|---|
container_issue | |
container_start_page | 104089 |
container_title | Computers in biology and medicine |
container_volume | 128 |
creator | Ghiasi, Mohammad M. Zendehboudi, Sohrab |
description | As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future.
•A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models. |
doi_str_mv | 10.1016/j.compbiomed.2020.104089 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2471461990</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0010482520304200</els_id><sourcerecordid>2472657300</sourcerecordid><originalsourceid>FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</originalsourceid><addsrcrecordid>eNqFkUtLxDAUhYMozvj4C1Jw46bjTZs26XIcfIHgRlcuQpLeaoa2qUlH8N-bMg6KG7NJyP1OTjiHkITCggItL9cL47pBW9dhvcggm64ZiGqPzKngVQpFzvbJHIBCykRWzMhRCGuACOVwSGZ5XKIS2Zy8LIehtUaN1vWJa5IajQ3TefSIqVYB6wT7gJ1uMWlR-d72r4mN8zdMTKtCsM0vufaowpgY1Rv0J-SgUW3A0-_9mDzfXD-t7tKHx9v71fIhNawUY1pR0zAETYWmpckKDlneaMo4UFY0ORVQZWXBK065YhSZ5iWtRFkyAznyosiPycX23cG79w2GUXY2GGxb1aPbBJkxTlnUVBDR8z_o2m18H383UZNNDhMltpTxLgSPjRy87ZT_lBTkVIBcy58C5FSA3BYQpWffBhs9zXbCXeIRuNoCGBP5sOhlMBZjXLX1aEZZO_u_yxfSGZnH</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2472657300</pqid></control><display><type>article</type><title>Application of decision tree-based ensemble learning in the classification of breast cancer</title><source>Elsevier ScienceDirect Journals</source><source>ProQuest Central UK/Ireland</source><creator>Ghiasi, Mohammad M. ; Zendehboudi, Sohrab</creator><creatorcontrib>Ghiasi, Mohammad M. ; Zendehboudi, Sohrab</creatorcontrib><description>As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future.
•A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.</description><identifier>ISSN: 0010-4825</identifier><identifier>EISSN: 1879-0534</identifier><identifier>DOI: 10.1016/j.compbiomed.2020.104089</identifier><identifier>PMID: 33338982</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Accuracy ; Algorithms ; Back propagation ; Biopsy ; Breast cancer ; Cancer therapies ; Cell size ; Chromatin ; Classification ; Cytology ; Data mining ; Decision analysis ; Decision trees ; Diagnostic software ; Diagnostic systems ; Ensemble learning ; Error analysis ; Fuzzy sets ; Investigations ; Learning algorithms ; Linear programming ; Machine learning ; Mammography ; Methods ; Mitosis ; Neural networks ; Random forest/extra trees ; Screening ; Statistical analysis ; Support vector machines ; Tumors ; Wisconsin breast cancer database</subject><ispartof>Computers in biology and medicine, 2021-01, Vol.128, p.104089-104089, Article 104089</ispartof><rights>2020 Elsevier Ltd</rights><rights>Copyright © 2020 Elsevier Ltd. All rights reserved.</rights><rights>2020. Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</citedby><cites>FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2472657300?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,777,781,3537,27905,27906,45976,64364,64366,64368,72218</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33338982$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ghiasi, Mohammad M.</creatorcontrib><creatorcontrib>Zendehboudi, Sohrab</creatorcontrib><title>Application of decision tree-based ensemble learning in the classification of breast cancer</title><title>Computers in biology and medicine</title><addtitle>Comput Biol Med</addtitle><description>As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future.
•A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Back propagation</subject><subject>Biopsy</subject><subject>Breast cancer</subject><subject>Cancer therapies</subject><subject>Cell size</subject><subject>Chromatin</subject><subject>Classification</subject><subject>Cytology</subject><subject>Data mining</subject><subject>Decision analysis</subject><subject>Decision trees</subject><subject>Diagnostic software</subject><subject>Diagnostic systems</subject><subject>Ensemble learning</subject><subject>Error analysis</subject><subject>Fuzzy sets</subject><subject>Investigations</subject><subject>Learning algorithms</subject><subject>Linear programming</subject><subject>Machine learning</subject><subject>Mammography</subject><subject>Methods</subject><subject>Mitosis</subject><subject>Neural networks</subject><subject>Random forest/extra trees</subject><subject>Screening</subject><subject>Statistical analysis</subject><subject>Support vector machines</subject><subject>Tumors</subject><subject>Wisconsin breast cancer database</subject><issn>0010-4825</issn><issn>1879-0534</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkUtLxDAUhYMozvj4C1Jw46bjTZs26XIcfIHgRlcuQpLeaoa2qUlH8N-bMg6KG7NJyP1OTjiHkITCggItL9cL47pBW9dhvcggm64ZiGqPzKngVQpFzvbJHIBCykRWzMhRCGuACOVwSGZ5XKIS2Zy8LIehtUaN1vWJa5IajQ3TefSIqVYB6wT7gJ1uMWlR-d72r4mN8zdMTKtCsM0vufaowpgY1Rv0J-SgUW3A0-_9mDzfXD-t7tKHx9v71fIhNawUY1pR0zAETYWmpckKDlneaMo4UFY0ORVQZWXBK065YhSZ5iWtRFkyAznyosiPycX23cG79w2GUXY2GGxb1aPbBJkxTlnUVBDR8z_o2m18H383UZNNDhMltpTxLgSPjRy87ZT_lBTkVIBcy58C5FSA3BYQpWffBhs9zXbCXeIRuNoCGBP5sOhlMBZjXLX1aEZZO_u_yxfSGZnH</recordid><startdate>202101</startdate><enddate>202101</enddate><creator>Ghiasi, Mohammad M.</creator><creator>Zendehboudi, Sohrab</creator><general>Elsevier Ltd</general><general>Elsevier Limited</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7RV</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>KB0</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M7P</scope><scope>M7Z</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>202101</creationdate><title>Application of decision tree-based ensemble learning in the classification of breast cancer</title><author>Ghiasi, Mohammad M. ; Zendehboudi, Sohrab</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c468t-91cf4e0b18b16c257023fb1470145f3180926579717a41e4b76198664c03e7553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Back propagation</topic><topic>Biopsy</topic><topic>Breast cancer</topic><topic>Cancer therapies</topic><topic>Cell size</topic><topic>Chromatin</topic><topic>Classification</topic><topic>Cytology</topic><topic>Data mining</topic><topic>Decision analysis</topic><topic>Decision trees</topic><topic>Diagnostic software</topic><topic>Diagnostic systems</topic><topic>Ensemble learning</topic><topic>Error analysis</topic><topic>Fuzzy sets</topic><topic>Investigations</topic><topic>Learning algorithms</topic><topic>Linear programming</topic><topic>Machine learning</topic><topic>Mammography</topic><topic>Methods</topic><topic>Mitosis</topic><topic>Neural networks</topic><topic>Random forest/extra trees</topic><topic>Screening</topic><topic>Statistical analysis</topic><topic>Support vector machines</topic><topic>Tumors</topic><topic>Wisconsin breast cancer database</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ghiasi, Mohammad M.</creatorcontrib><creatorcontrib>Zendehboudi, Sohrab</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Nursing & Allied Health Database</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Biological Science Database</collection><collection>Biochemistry Abstracts 1</collection><collection>Research Library (Corporate)</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Computers in biology and medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ghiasi, Mohammad M.</au><au>Zendehboudi, Sohrab</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Application of decision tree-based ensemble learning in the classification of breast cancer</atitle><jtitle>Computers in biology and medicine</jtitle><addtitle>Comput Biol Med</addtitle><date>2021-01</date><risdate>2021</risdate><volume>128</volume><spage>104089</spage><epage>104089</epage><pages>104089-104089</pages><artnum>104089</artnum><issn>0010-4825</issn><eissn>1879-0534</eissn><abstract>As a common screening and diagnostic tool, Fine Needle Aspiration Biopsy (FNAB) of the suspicious breast lumps can be used to distinguish between malignant and benign breast cytology. In this study, we first review published works on the classification of breast cancer where the machine learning and data mining algorithms have been applied by using the Wisconsin Breast Cancer Database (WBCD). This work then introduces useful new tools, based on Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) algorithms to classify breast cancer. The RF and ET strategies use the decision trees as proper classifiers to attain the ultimate classification. The RF and ET approaches include four main stages: input identification, determination of the optimal number of trees, voting analysis, and final decision. The models implemented in this research consider important factors such as uniformity of cell size, bland chromatin, mitoses, and clump thickness as the input parameters. According to the statistical analysis, the proposed methods are able to classify the type of breast cancer accurately. The error analysis results reveal that the designed RF and ET models offer easy-to-use outcomes and the highest diagnostic performance, compared to previous tools/models in the literature for the WBCD classification. The highest and lowest magnitudes of relative importance are attributed to the uniformity of cell size and mitoses among the factors. It is expected that the RF and ET algorithms play an important role in medicine and health systems for screening and diagnosis in the near future.
•A systematic study is conducted on classification of breast cancer based on WBCD.•This study offers an effective visualization tool for breast cancer classification.•The Random Forest (RF) and Extra Trees (ET) methodologies are implemented for WBCD classification.•The presented models offer the highest diagnostic performance, compared to previous models.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>33338982</pmid><doi>10.1016/j.compbiomed.2020.104089</doi><tpages>1</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0010-4825 |
ispartof | Computers in biology and medicine, 2021-01, Vol.128, p.104089-104089, Article 104089 |
issn | 0010-4825 1879-0534 |
language | eng |
recordid | cdi_proquest_miscellaneous_2471461990 |
source | Elsevier ScienceDirect Journals; ProQuest Central UK/Ireland |
subjects | Accuracy Algorithms Back propagation Biopsy Breast cancer Cancer therapies Cell size Chromatin Classification Cytology Data mining Decision analysis Decision trees Diagnostic software Diagnostic systems Ensemble learning Error analysis Fuzzy sets Investigations Learning algorithms Linear programming Machine learning Mammography Methods Mitosis Neural networks Random forest/extra trees Screening Statistical analysis Support vector machines Tumors Wisconsin breast cancer database |
title | Application of decision tree-based ensemble learning in the classification of breast cancer |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T12%3A04%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Application%20of%20decision%20tree-based%20ensemble%20learning%20in%20the%20classification%20of%20breast%20cancer&rft.jtitle=Computers%20in%20biology%20and%20medicine&rft.au=Ghiasi,%20Mohammad%20M.&rft.date=2021-01&rft.volume=128&rft.spage=104089&rft.epage=104089&rft.pages=104089-104089&rft.artnum=104089&rft.issn=0010-4825&rft.eissn=1879-0534&rft_id=info:doi/10.1016/j.compbiomed.2020.104089&rft_dat=%3Cproquest_cross%3E2472657300%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2472657300&rft_id=info:pmid/33338982&rft_els_id=S0010482520304200&rfr_iscdi=true |