Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms

One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied nanoscience 2023, Vol.13 (3), p.1829-1840
Hauptverfasser:	Sowjanya, A. Mary, Mrudula, Owk
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Chemistry and Materials Science Coronaviruses Data sampling Datasets Decision trees Deep learning Health care Machine learning Materials Science Membrane Biology Nanochemistry Nanotechnology Nanotechnology and Microengineering Original Original Article Predictions Predictive analytics Sampling methods Stacking
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1840
container_issue	3
container_start_page	1829
container_title	Applied nanoscience
container_volume	13
creator	Sowjanya, A. Mary Mrudula, Owk
description	One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in such cases. To overcome this problem, a two-step approach has been proposed. In the first step, SMOTE is modified to reduce the class imbalance in terms of Distance-based SMOTE (D-SMOTE) and Bi-phasic SMOTE (BP-SMOTE) which were then coupled with selective classifiers for prediction. An increase in accuracy is noted for both BP-SMOTE and D-SMOTE compared to basic SMOTE. In the second step, Machine learning, Deep Learning and Ensemble algorithms were used to develop a Stacking Ensemble Framework which showed a significant increase in accuracy for Stacking compared to individual machine learning algorithms like Decision Tree, Naïve Bayes, Neural Networks and Ensemble techniques like Voting, Bagging and Boosting. Two different methods have been developed by combing Deep learning with Stacking approach namely Stacked CNN and Stacked RNN which yielded significantly higher accuracy of 96–97% compared to individual algorithms. Framingham dataset is used for data sampling, Wisconsin Hospital data of Breast Cancer study is used for Stacked CNN and Novel Coronavirus 2019 dataset relating to forecasting COVID-19 cases, is used for Stacked RNN.
doi_str_mv	10.1007/s13204-021-02063-4
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8811587</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2626892582</sourcerecordid><originalsourceid>FETCH-LOGICAL-c474t-f410056d8b8fabfd86c8792bc6063e57472c543209726e09e24a7b7a8ea7da193</originalsourceid><addsrcrecordid>eNp9UU1v1DAUjBAVrdr-AQ7IEhcuAdvxVy5IqFo-pKIeKGfLcV52XZJ4sZ1WSPx43nbLQjlgyfKz3szY86aqnjP6mlGq32TWcCpqyhluqppaPKlOOGtpLSXTTw81bY-r85xvKC4ptGrks-q4kchulDmpfq6GAXwJt0BKAlcmmAuJAwlT50Y3e-hJ74rLUDIJM9mAG8uGeJeALDnMazLFPgwBYV8-X12viI_LdsTbXUBYLs5_2ykAbMkILs07hhvXMWF7ymfV0eDGDOcP52n19f3q-uJjfXn14dPFu8vaCy1KPQh0LFVvOjO4buiN8ka3vPMKfYPUQnMvBY6j1VwBbYELpzvtDDjdO9Y2p9Xbve526SboPXpMbrTbFCaXftjogn3cmcPGruOtNYYxaTQKvHoQSPH7ArnYKWQPI04I4pItV1yZlkvDEfryH-hNXNKM9izXhnMuGdsJ8j3Kp5hzguHwGUbtLl-7z9divvY-XyuQ9OJvGwfK7zQR0OwBGVvzGtKft_8j-wvw07Hj</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2782225117</pqid></control><display><type>article</type><title>Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms</title><source>SpringerNature Journals</source><creator>Sowjanya, A. Mary ; Mrudula, Owk</creator><creatorcontrib>Sowjanya, A. Mary ; Mrudula, Owk</creatorcontrib><description>One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in such cases. To overcome this problem, a two-step approach has been proposed. In the first step, SMOTE is modified to reduce the class imbalance in terms of Distance-based SMOTE (D-SMOTE) and Bi-phasic SMOTE (BP-SMOTE) which were then coupled with selective classifiers for prediction. An increase in accuracy is noted for both BP-SMOTE and D-SMOTE compared to basic SMOTE. In the second step, Machine learning, Deep Learning and Ensemble algorithms were used to develop a Stacking Ensemble Framework which showed a significant increase in accuracy for Stacking compared to individual machine learning algorithms like Decision Tree, Naïve Bayes, Neural Networks and Ensemble techniques like Voting, Bagging and Boosting. Two different methods have been developed by combing Deep learning with Stacking approach namely Stacked CNN and Stacked RNN which yielded significantly higher accuracy of 96–97% compared to individual algorithms. Framingham dataset is used for data sampling, Wisconsin Hospital data of Breast Cancer study is used for Stacked CNN and Novel Coronavirus 2019 dataset relating to forecasting COVID-19 cases, is used for Stacked RNN.</description><identifier>ISSN: 2190-5509</identifier><identifier>EISSN: 2190-5517</identifier><identifier>DOI: 10.1007/s13204-021-02063-4</identifier><identifier>PMID: 35132368</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Accuracy ; Algorithms ; Chemistry and Materials Science ; Coronaviruses ; Data sampling ; Datasets ; Decision trees ; Deep learning ; Health care ; Machine learning ; Materials Science ; Membrane Biology ; Nanochemistry ; Nanotechnology ; Nanotechnology and Microengineering ; Original ; Original Article ; Predictions ; Predictive analytics ; Sampling methods ; Stacking</subject><ispartof>Applied nanoscience, 2023, Vol.13 (3), p.1829-1840</ispartof><rights>King Abdulaziz City for Science and Technology 2021</rights><rights>King Abdulaziz City for Science and Technology 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c474t-f410056d8b8fabfd86c8792bc6063e57472c543209726e09e24a7b7a8ea7da193</citedby><cites>FETCH-LOGICAL-c474t-f410056d8b8fabfd86c8792bc6063e57472c543209726e09e24a7b7a8ea7da193</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s13204-021-02063-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s13204-021-02063-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35132368$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sowjanya, A. Mary</creatorcontrib><creatorcontrib>Mrudula, Owk</creatorcontrib><title>Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms</title><title>Applied nanoscience</title><addtitle>Appl Nanosci</addtitle><addtitle>Appl Nanosci</addtitle><description>One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in such cases. To overcome this problem, a two-step approach has been proposed. In the first step, SMOTE is modified to reduce the class imbalance in terms of Distance-based SMOTE (D-SMOTE) and Bi-phasic SMOTE (BP-SMOTE) which were then coupled with selective classifiers for prediction. An increase in accuracy is noted for both BP-SMOTE and D-SMOTE compared to basic SMOTE. In the second step, Machine learning, Deep Learning and Ensemble algorithms were used to develop a Stacking Ensemble Framework which showed a significant increase in accuracy for Stacking compared to individual machine learning algorithms like Decision Tree, Naïve Bayes, Neural Networks and Ensemble techniques like Voting, Bagging and Boosting. Two different methods have been developed by combing Deep learning with Stacking approach namely Stacked CNN and Stacked RNN which yielded significantly higher accuracy of 96–97% compared to individual algorithms. Framingham dataset is used for data sampling, Wisconsin Hospital data of Breast Cancer study is used for Stacked CNN and Novel Coronavirus 2019 dataset relating to forecasting COVID-19 cases, is used for Stacked RNN.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Chemistry and Materials Science</subject><subject>Coronaviruses</subject><subject>Data sampling</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Health care</subject><subject>Machine learning</subject><subject>Materials Science</subject><subject>Membrane Biology</subject><subject>Nanochemistry</subject><subject>Nanotechnology</subject><subject>Nanotechnology and Microengineering</subject><subject>Original</subject><subject>Original Article</subject><subject>Predictions</subject><subject>Predictive analytics</subject><subject>Sampling methods</subject><subject>Stacking</subject><issn>2190-5509</issn><issn>2190-5517</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9UU1v1DAUjBAVrdr-AQ7IEhcuAdvxVy5IqFo-pKIeKGfLcV52XZJ4sZ1WSPx43nbLQjlgyfKz3szY86aqnjP6mlGq32TWcCpqyhluqppaPKlOOGtpLSXTTw81bY-r85xvKC4ptGrks-q4kchulDmpfq6GAXwJt0BKAlcmmAuJAwlT50Y3e-hJ74rLUDIJM9mAG8uGeJeALDnMazLFPgwBYV8-X12viI_LdsTbXUBYLs5_2ykAbMkILs07hhvXMWF7ymfV0eDGDOcP52n19f3q-uJjfXn14dPFu8vaCy1KPQh0LFVvOjO4buiN8ka3vPMKfYPUQnMvBY6j1VwBbYELpzvtDDjdO9Y2p9Xbve526SboPXpMbrTbFCaXftjogn3cmcPGruOtNYYxaTQKvHoQSPH7ArnYKWQPI04I4pItV1yZlkvDEfryH-hNXNKM9izXhnMuGdsJ8j3Kp5hzguHwGUbtLl-7z9divvY-XyuQ9OJvGwfK7zQR0OwBGVvzGtKft_8j-wvw07Hj</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Sowjanya, A. Mary</creator><creator>Mrudula, Owk</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>2023</creationdate><title>Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms</title><author>Sowjanya, A. Mary ; Mrudula, Owk</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c474t-f410056d8b8fabfd86c8792bc6063e57472c543209726e09e24a7b7a8ea7da193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Chemistry and Materials Science</topic><topic>Coronaviruses</topic><topic>Data sampling</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Health care</topic><topic>Machine learning</topic><topic>Materials Science</topic><topic>Membrane Biology</topic><topic>Nanochemistry</topic><topic>Nanotechnology</topic><topic>Nanotechnology and Microengineering</topic><topic>Original</topic><topic>Original Article</topic><topic>Predictions</topic><topic>Predictive analytics</topic><topic>Sampling methods</topic><topic>Stacking</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sowjanya, A. Mary</creatorcontrib><creatorcontrib>Mrudula, Owk</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Applied nanoscience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sowjanya, A. Mary</au><au>Mrudula, Owk</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms</atitle><jtitle>Applied nanoscience</jtitle><stitle>Appl Nanosci</stitle><addtitle>Appl Nanosci</addtitle><date>2023</date><risdate>2023</risdate><volume>13</volume><issue>3</issue><spage>1829</spage><epage>1840</epage><pages>1829-1840</pages><issn>2190-5509</issn><eissn>2190-5517</eissn><abstract>One of the prominent uses of Predictive Analytics is Health care for more accurate predictions based on proper analysis of cumulative datasets. Often times the datasets are quite imbalanced and sampling techniques like Synthetic Minority Oversampling Technique (SMOTE) give only moderate accuracy in such cases. To overcome this problem, a two-step approach has been proposed. In the first step, SMOTE is modified to reduce the class imbalance in terms of Distance-based SMOTE (D-SMOTE) and Bi-phasic SMOTE (BP-SMOTE) which were then coupled with selective classifiers for prediction. An increase in accuracy is noted for both BP-SMOTE and D-SMOTE compared to basic SMOTE. In the second step, Machine learning, Deep Learning and Ensemble algorithms were used to develop a Stacking Ensemble Framework which showed a significant increase in accuracy for Stacking compared to individual machine learning algorithms like Decision Tree, Naïve Bayes, Neural Networks and Ensemble techniques like Voting, Bagging and Boosting. Two different methods have been developed by combing Deep learning with Stacking approach namely Stacked CNN and Stacked RNN which yielded significantly higher accuracy of 96–97% compared to individual algorithms. Framingham dataset is used for data sampling, Wisconsin Hospital data of Breast Cancer study is used for Stacked CNN and Novel Coronavirus 2019 dataset relating to forecasting COVID-19 cases, is used for Stacked RNN.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>35132368</pmid><doi>10.1007/s13204-021-02063-4</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2190-5509
ispartof	Applied nanoscience, 2023, Vol.13 (3), p.1829-1840
issn	2190-5509 2190-5517
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8811587
source	SpringerNature Journals
subjects	Accuracy Algorithms Chemistry and Materials Science Coronaviruses Data sampling Datasets Decision trees Deep learning Health care Machine learning Materials Science Membrane Biology Nanochemistry Nanotechnology Nanotechnology and Microengineering Original Original Article Predictions Predictive analytics Sampling methods Stacking
title	Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T21%3A33%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effective%20treatment%20of%20imbalanced%20datasets%20in%20health%20care%20using%20modified%20SMOTE%20coupled%20with%20stacked%20deep%20learning%20algorithms&rft.jtitle=Applied%20nanoscience&rft.au=Sowjanya,%20A.%20Mary&rft.date=2023&rft.volume=13&rft.issue=3&rft.spage=1829&rft.epage=1840&rft.pages=1829-1840&rft.issn=2190-5509&rft.eissn=2190-5517&rft_id=info:doi/10.1007/s13204-021-02063-4&rft_dat=%3Cproquest_pubme%3E2626892582%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2782225117&rft_id=info:pmid/35132368&rfr_iscdi=true