Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology

The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.55879-55897
Hauptverfasser:	Mirza, Behroz, Haroon, Danish, Khan, Behraj, Padhani, Ali, Syed, Tahir Q.
Format:	Artikel
Sprache:	eng
Schlagworte:	Adversarial networks anomaly detection class imbalance Classification Clustering algorithms Context modeling Data models Data points deep generative models density estimation generative variational auto encoders instance hardness threshold Machine learning machine learning best practices Manifolds Mathematical model Measurement Model testing Recall restricted Boltzmann machines Sensitivity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	55897
container_issue
container_start_page	55879
container_title	IEEE access
container_volume	9
creator	Mirza, Behroz Haroon, Danish Khan, Behraj Padhani, Ali Syed, Tahir Q.
description	The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.
doi_str_mv	10.1109/ACCESS.2021.3071389
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2021_3071389</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9395632</ieee_id><doaj_id>oai_doaj_org_article_4a9d07c51a75479aac627072136355ef</doaj_id><sourcerecordid>2513396185</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</originalsourceid><addsrcrecordid>eNpNUV1r3DAQNKWFhjS_IC-CPvsqaa2vvh1umh7kaCGBPoq1vL74cCxH8hXy7-urQ-i-7DLMzA5MUVwLvhGCuy_bur65v99ILsUGuBFg3bviQgrtSlCg3_93fyyucj7yZewCKXNRPH8jmtgtjZRw7v8Q28eWhszmyOp4GmdKrB4wZ7Z7anDAMdBXtl1J5Z7m1Ae2x2nqxwP73c-P7FeKU0xzH0dW49A3Z9flXqiPsY1DPLx8Kj50OGS6et2XxcP3m4f6R3n383ZXb-_KUHE7l2S14qF1CkiArMA2UruuMdQqMKFC1QnnXGcRkJRtXGhka7WU0Bl0nYDLYrfathGPfkr9E6YXH7H3_4CYDh6XnGEgX6FruQlKoFGVcYhBS8ONFKBBKeoWr8-r15Ti84ny7I_xlMYlvZdKADgtrFpYsLJCijkn6t6-Cu7PTfm1KX9uyr82taiuV1VPRG8KB05pkPAXSQKOoQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2513396185</pqid></control><display><type>article</type><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</creator><creatorcontrib>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</creatorcontrib><description>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3071389</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adversarial networks ; anomaly detection ; class imbalance ; Classification ; Clustering algorithms ; Context modeling ; Data models ; Data points ; deep generative models ; density estimation ; generative variational auto encoders ; instance hardness threshold ; Machine learning ; machine learning best practices ; Manifolds ; Mathematical model ; Measurement ; Model testing ; Recall ; restricted Boltzmann machines ; Sensitivity</subject><ispartof>IEEE access, 2021, Vol.9, p.55879-55897</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</citedby><cites>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</cites><orcidid>0000-0003-0638-9689 ; 0000-0003-0985-9543</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9395632$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Mirza, Behroz</creatorcontrib><creatorcontrib>Haroon, Danish</creatorcontrib><creatorcontrib>Khan, Behraj</creatorcontrib><creatorcontrib>Padhani, Ali</creatorcontrib><creatorcontrib>Syed, Tahir Q.</creatorcontrib><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><title>IEEE access</title><addtitle>Access</addtitle><description>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</description><subject>Adversarial networks</subject><subject>anomaly detection</subject><subject>class imbalance</subject><subject>Classification</subject><subject>Clustering algorithms</subject><subject>Context modeling</subject><subject>Data models</subject><subject>Data points</subject><subject>deep generative models</subject><subject>density estimation</subject><subject>generative variational auto encoders</subject><subject>instance hardness threshold</subject><subject>Machine learning</subject><subject>machine learning best practices</subject><subject>Manifolds</subject><subject>Mathematical model</subject><subject>Measurement</subject><subject>Model testing</subject><subject>Recall</subject><subject>restricted Boltzmann machines</subject><subject>Sensitivity</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1r3DAQNKWFhjS_IC-CPvsqaa2vvh1umh7kaCGBPoq1vL74cCxH8hXy7-urQ-i-7DLMzA5MUVwLvhGCuy_bur65v99ILsUGuBFg3bviQgrtSlCg3_93fyyucj7yZewCKXNRPH8jmtgtjZRw7v8Q28eWhszmyOp4GmdKrB4wZ7Z7anDAMdBXtl1J5Z7m1Ae2x2nqxwP73c-P7FeKU0xzH0dW49A3Z9flXqiPsY1DPLx8Kj50OGS6et2XxcP3m4f6R3n383ZXb-_KUHE7l2S14qF1CkiArMA2UruuMdQqMKFC1QnnXGcRkJRtXGhka7WU0Bl0nYDLYrfathGPfkr9E6YXH7H3_4CYDh6XnGEgX6FruQlKoFGVcYhBS8ONFKBBKeoWr8-r15Ti84ny7I_xlMYlvZdKADgtrFpYsLJCijkn6t6-Cu7PTfm1KX9uyr82taiuV1VPRG8KB05pkPAXSQKOoQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Mirza, Behroz</creator><creator>Haroon, Danish</creator><creator>Khan, Behraj</creator><creator>Padhani, Ali</creator><creator>Syed, Tahir Q.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-0638-9689</orcidid><orcidid>https://orcid.org/0000-0003-0985-9543</orcidid></search><sort><creationdate>2021</creationdate><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><author>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Adversarial networks</topic><topic>anomaly detection</topic><topic>class imbalance</topic><topic>Classification</topic><topic>Clustering algorithms</topic><topic>Context modeling</topic><topic>Data models</topic><topic>Data points</topic><topic>deep generative models</topic><topic>density estimation</topic><topic>generative variational auto encoders</topic><topic>instance hardness threshold</topic><topic>Machine learning</topic><topic>machine learning best practices</topic><topic>Manifolds</topic><topic>Mathematical model</topic><topic>Measurement</topic><topic>Model testing</topic><topic>Recall</topic><topic>restricted Boltzmann machines</topic><topic>Sensitivity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mirza, Behroz</creatorcontrib><creatorcontrib>Haroon, Danish</creatorcontrib><creatorcontrib>Khan, Behraj</creatorcontrib><creatorcontrib>Padhani, Ali</creatorcontrib><creatorcontrib>Syed, Tahir Q.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mirza, Behroz</au><au>Haroon, Danish</au><au>Khan, Behraj</au><au>Padhani, Ali</au><au>Syed, Tahir Q.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>55879</spage><epage>55897</epage><pages>55879-55897</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3071389</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-0638-9689</orcidid><orcidid>https://orcid.org/0000-0003-0985-9543</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2021, Vol.9, p.55879-55897
issn	2169-3536 2169-3536
language	eng
recordid	cdi_crossref_primary_10_1109_ACCESS_2021_3071389
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Adversarial networks anomaly detection class imbalance Classification Clustering algorithms Context modeling Data models Data points deep generative models density estimation generative variational auto encoders instance hardness threshold Machine learning machine learning best practices Manifolds Mathematical model Measurement Model testing Recall restricted Boltzmann machines Sensitivity
title	Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T00%3A18%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Generative%20Models%20to%20Counter%20Class%20Imbalance:%20A%20Model-Metric%20Mapping%20With%20Proportion%20Calibration%20Methodology&rft.jtitle=IEEE%20access&rft.au=Mirza,%20Behroz&rft.date=2021&rft.volume=9&rft.spage=55879&rft.epage=55897&rft.pages=55879-55897&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3071389&rft_dat=%3Cproquest_cross%3E2513396185%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2513396185&rft_id=info:pmid/&rft_ieee_id=9395632&rft_doaj_id=oai_doaj_org_article_4a9d07c51a75479aac627072136355ef&rfr_iscdi=true