Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology

The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.55879-55897
Hauptverfasser: Mirza, Behroz, Haroon, Danish, Khan, Behraj, Padhani, Ali, Syed, Tahir Q.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 55897
container_issue
container_start_page 55879
container_title IEEE access
container_volume 9
creator Mirza, Behroz
Haroon, Danish
Khan, Behraj
Padhani, Ali
Syed, Tahir Q.
description The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.
doi_str_mv 10.1109/ACCESS.2021.3071389
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2021_3071389</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9395632</ieee_id><doaj_id>oai_doaj_org_article_4a9d07c51a75479aac627072136355ef</doaj_id><sourcerecordid>2513396185</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</originalsourceid><addsrcrecordid>eNpNUV1r3DAQNKWFhjS_IC-CPvsqaa2vvh1umh7kaCGBPoq1vL74cCxH8hXy7-urQ-i-7DLMzA5MUVwLvhGCuy_bur65v99ILsUGuBFg3bviQgrtSlCg3_93fyyucj7yZewCKXNRPH8jmtgtjZRw7v8Q28eWhszmyOp4GmdKrB4wZ7Z7anDAMdBXtl1J5Z7m1Ae2x2nqxwP73c-P7FeKU0xzH0dW49A3Z9flXqiPsY1DPLx8Kj50OGS6et2XxcP3m4f6R3n383ZXb-_KUHE7l2S14qF1CkiArMA2UruuMdQqMKFC1QnnXGcRkJRtXGhka7WU0Bl0nYDLYrfathGPfkr9E6YXH7H3_4CYDh6XnGEgX6FruQlKoFGVcYhBS8ONFKBBKeoWr8-r15Ti84ny7I_xlMYlvZdKADgtrFpYsLJCijkn6t6-Cu7PTfm1KX9uyr82taiuV1VPRG8KB05pkPAXSQKOoQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2513396185</pqid></control><display><type>article</type><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</creator><creatorcontrib>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</creatorcontrib><description>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3071389</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adversarial networks ; anomaly detection ; class imbalance ; Classification ; Clustering algorithms ; Context modeling ; Data models ; Data points ; deep generative models ; density estimation ; generative variational auto encoders ; instance hardness threshold ; Machine learning ; machine learning best practices ; Manifolds ; Mathematical model ; Measurement ; Model testing ; Recall ; restricted Boltzmann machines ; Sensitivity</subject><ispartof>IEEE access, 2021, Vol.9, p.55879-55897</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</citedby><cites>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</cites><orcidid>0000-0003-0638-9689 ; 0000-0003-0985-9543</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9395632$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Mirza, Behroz</creatorcontrib><creatorcontrib>Haroon, Danish</creatorcontrib><creatorcontrib>Khan, Behraj</creatorcontrib><creatorcontrib>Padhani, Ali</creatorcontrib><creatorcontrib>Syed, Tahir Q.</creatorcontrib><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><title>IEEE access</title><addtitle>Access</addtitle><description>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</description><subject>Adversarial networks</subject><subject>anomaly detection</subject><subject>class imbalance</subject><subject>Classification</subject><subject>Clustering algorithms</subject><subject>Context modeling</subject><subject>Data models</subject><subject>Data points</subject><subject>deep generative models</subject><subject>density estimation</subject><subject>generative variational auto encoders</subject><subject>instance hardness threshold</subject><subject>Machine learning</subject><subject>machine learning best practices</subject><subject>Manifolds</subject><subject>Mathematical model</subject><subject>Measurement</subject><subject>Model testing</subject><subject>Recall</subject><subject>restricted Boltzmann machines</subject><subject>Sensitivity</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1r3DAQNKWFhjS_IC-CPvsqaa2vvh1umh7kaCGBPoq1vL74cCxH8hXy7-urQ-i-7DLMzA5MUVwLvhGCuy_bur65v99ILsUGuBFg3bviQgrtSlCg3_93fyyucj7yZewCKXNRPH8jmtgtjZRw7v8Q28eWhszmyOp4GmdKrB4wZ7Z7anDAMdBXtl1J5Z7m1Ae2x2nqxwP73c-P7FeKU0xzH0dW49A3Z9flXqiPsY1DPLx8Kj50OGS6et2XxcP3m4f6R3n383ZXb-_KUHE7l2S14qF1CkiArMA2UruuMdQqMKFC1QnnXGcRkJRtXGhka7WU0Bl0nYDLYrfathGPfkr9E6YXH7H3_4CYDh6XnGEgX6FruQlKoFGVcYhBS8ONFKBBKeoWr8-r15Ti84ny7I_xlMYlvZdKADgtrFpYsLJCijkn6t6-Cu7PTfm1KX9uyr82taiuV1VPRG8KB05pkPAXSQKOoQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Mirza, Behroz</creator><creator>Haroon, Danish</creator><creator>Khan, Behraj</creator><creator>Padhani, Ali</creator><creator>Syed, Tahir Q.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-0638-9689</orcidid><orcidid>https://orcid.org/0000-0003-0985-9543</orcidid></search><sort><creationdate>2021</creationdate><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><author>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Adversarial networks</topic><topic>anomaly detection</topic><topic>class imbalance</topic><topic>Classification</topic><topic>Clustering algorithms</topic><topic>Context modeling</topic><topic>Data models</topic><topic>Data points</topic><topic>deep generative models</topic><topic>density estimation</topic><topic>generative variational auto encoders</topic><topic>instance hardness threshold</topic><topic>Machine learning</topic><topic>machine learning best practices</topic><topic>Manifolds</topic><topic>Mathematical model</topic><topic>Measurement</topic><topic>Model testing</topic><topic>Recall</topic><topic>restricted Boltzmann machines</topic><topic>Sensitivity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mirza, Behroz</creatorcontrib><creatorcontrib>Haroon, Danish</creatorcontrib><creatorcontrib>Khan, Behraj</creatorcontrib><creatorcontrib>Padhani, Ali</creatorcontrib><creatorcontrib>Syed, Tahir Q.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mirza, Behroz</au><au>Haroon, Danish</au><au>Khan, Behraj</au><au>Padhani, Ali</au><au>Syed, Tahir Q.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>55879</spage><epage>55897</epage><pages>55879-55897</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3071389</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-0638-9689</orcidid><orcidid>https://orcid.org/0000-0003-0985-9543</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.55879-55897
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2021_3071389
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Adversarial networks
anomaly detection
class imbalance
Classification
Clustering algorithms
Context modeling
Data models
Data points
deep generative models
density estimation
generative variational auto encoders
instance hardness threshold
Machine learning
machine learning best practices
Manifolds
Mathematical model
Measurement
Model testing
Recall
restricted Boltzmann machines
Sensitivity
title Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T00%3A18%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Generative%20Models%20to%20Counter%20Class%20Imbalance:%20A%20Model-Metric%20Mapping%20With%20Proportion%20Calibration%20Methodology&rft.jtitle=IEEE%20access&rft.au=Mirza,%20Behroz&rft.date=2021&rft.volume=9&rft.spage=55879&rft.epage=55897&rft.pages=55879-55897&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3071389&rft_dat=%3Cproquest_cross%3E2513396185%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2513396185&rft_id=info:pmid/&rft_ieee_id=9395632&rft_doaj_id=oai_doaj_org_article_4a9d07c51a75479aac627072136355ef&rfr_iscdi=true