Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology
The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data a...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.55879-55897 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 55897 |
---|---|
container_issue | |
container_start_page | 55879 |
container_title | IEEE access |
container_volume | 9 |
creator | Mirza, Behroz Haroon, Danish Khan, Behraj Padhani, Ali Syed, Tahir Q. |
description | The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class. |
doi_str_mv | 10.1109/ACCESS.2021.3071389 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2021_3071389</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9395632</ieee_id><doaj_id>oai_doaj_org_article_4a9d07c51a75479aac627072136355ef</doaj_id><sourcerecordid>2513396185</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</originalsourceid><addsrcrecordid>eNpNUV1r3DAQNKWFhjS_IC-CPvsqaa2vvh1umh7kaCGBPoq1vL74cCxH8hXy7-urQ-i-7DLMzA5MUVwLvhGCuy_bur65v99ILsUGuBFg3bviQgrtSlCg3_93fyyucj7yZewCKXNRPH8jmtgtjZRw7v8Q28eWhszmyOp4GmdKrB4wZ7Z7anDAMdBXtl1J5Z7m1Ae2x2nqxwP73c-P7FeKU0xzH0dW49A3Z9flXqiPsY1DPLx8Kj50OGS6et2XxcP3m4f6R3n383ZXb-_KUHE7l2S14qF1CkiArMA2UruuMdQqMKFC1QnnXGcRkJRtXGhka7WU0Bl0nYDLYrfathGPfkr9E6YXH7H3_4CYDh6XnGEgX6FruQlKoFGVcYhBS8ONFKBBKeoWr8-r15Ti84ny7I_xlMYlvZdKADgtrFpYsLJCijkn6t6-Cu7PTfm1KX9uyr82taiuV1VPRG8KB05pkPAXSQKOoQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2513396185</pqid></control><display><type>article</type><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</creator><creatorcontrib>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</creatorcontrib><description>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3071389</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adversarial networks ; anomaly detection ; class imbalance ; Classification ; Clustering algorithms ; Context modeling ; Data models ; Data points ; deep generative models ; density estimation ; generative variational auto encoders ; instance hardness threshold ; Machine learning ; machine learning best practices ; Manifolds ; Mathematical model ; Measurement ; Model testing ; Recall ; restricted Boltzmann machines ; Sensitivity</subject><ispartof>IEEE access, 2021, Vol.9, p.55879-55897</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</citedby><cites>FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</cites><orcidid>0000-0003-0638-9689 ; 0000-0003-0985-9543</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9395632$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Mirza, Behroz</creatorcontrib><creatorcontrib>Haroon, Danish</creatorcontrib><creatorcontrib>Khan, Behraj</creatorcontrib><creatorcontrib>Padhani, Ali</creatorcontrib><creatorcontrib>Syed, Tahir Q.</creatorcontrib><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><title>IEEE access</title><addtitle>Access</addtitle><description>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</description><subject>Adversarial networks</subject><subject>anomaly detection</subject><subject>class imbalance</subject><subject>Classification</subject><subject>Clustering algorithms</subject><subject>Context modeling</subject><subject>Data models</subject><subject>Data points</subject><subject>deep generative models</subject><subject>density estimation</subject><subject>generative variational auto encoders</subject><subject>instance hardness threshold</subject><subject>Machine learning</subject><subject>machine learning best practices</subject><subject>Manifolds</subject><subject>Mathematical model</subject><subject>Measurement</subject><subject>Model testing</subject><subject>Recall</subject><subject>restricted Boltzmann machines</subject><subject>Sensitivity</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1r3DAQNKWFhjS_IC-CPvsqaa2vvh1umh7kaCGBPoq1vL74cCxH8hXy7-urQ-i-7DLMzA5MUVwLvhGCuy_bur65v99ILsUGuBFg3bviQgrtSlCg3_93fyyucj7yZewCKXNRPH8jmtgtjZRw7v8Q28eWhszmyOp4GmdKrB4wZ7Z7anDAMdBXtl1J5Z7m1Ae2x2nqxwP73c-P7FeKU0xzH0dW49A3Z9flXqiPsY1DPLx8Kj50OGS6et2XxcP3m4f6R3n383ZXb-_KUHE7l2S14qF1CkiArMA2UruuMdQqMKFC1QnnXGcRkJRtXGhka7WU0Bl0nYDLYrfathGPfkr9E6YXH7H3_4CYDh6XnGEgX6FruQlKoFGVcYhBS8ONFKBBKeoWr8-r15Ti84ny7I_xlMYlvZdKADgtrFpYsLJCijkn6t6-Cu7PTfm1KX9uyr82taiuV1VPRG8KB05pkPAXSQKOoQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Mirza, Behroz</creator><creator>Haroon, Danish</creator><creator>Khan, Behraj</creator><creator>Padhani, Ali</creator><creator>Syed, Tahir Q.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-0638-9689</orcidid><orcidid>https://orcid.org/0000-0003-0985-9543</orcidid></search><sort><creationdate>2021</creationdate><title>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</title><author>Mirza, Behroz ; Haroon, Danish ; Khan, Behraj ; Padhani, Ali ; Syed, Tahir Q.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-e8650cd953e132438b269fb7ed537c4a5f1999f8a3ae58b9cb2d86223f7a9f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Adversarial networks</topic><topic>anomaly detection</topic><topic>class imbalance</topic><topic>Classification</topic><topic>Clustering algorithms</topic><topic>Context modeling</topic><topic>Data models</topic><topic>Data points</topic><topic>deep generative models</topic><topic>density estimation</topic><topic>generative variational auto encoders</topic><topic>instance hardness threshold</topic><topic>Machine learning</topic><topic>machine learning best practices</topic><topic>Manifolds</topic><topic>Mathematical model</topic><topic>Measurement</topic><topic>Model testing</topic><topic>Recall</topic><topic>restricted Boltzmann machines</topic><topic>Sensitivity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mirza, Behroz</creatorcontrib><creatorcontrib>Haroon, Danish</creatorcontrib><creatorcontrib>Khan, Behraj</creatorcontrib><creatorcontrib>Padhani, Ali</creatorcontrib><creatorcontrib>Syed, Tahir Q.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mirza, Behroz</au><au>Haroon, Danish</au><au>Khan, Behraj</au><au>Padhani, Ali</au><au>Syed, Tahir Q.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>55879</spage><epage>55897</epage><pages>55879-55897</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3071389</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-0638-9689</orcidid><orcidid>https://orcid.org/0000-0003-0985-9543</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2021, Vol.9, p.55879-55897 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2021_3071389 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Adversarial networks anomaly detection class imbalance Classification Clustering algorithms Context modeling Data models Data points deep generative models density estimation generative variational auto encoders instance hardness threshold Machine learning machine learning best practices Manifolds Mathematical model Measurement Model testing Recall restricted Boltzmann machines Sensitivity |
title | Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T00%3A18%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Generative%20Models%20to%20Counter%20Class%20Imbalance:%20A%20Model-Metric%20Mapping%20With%20Proportion%20Calibration%20Methodology&rft.jtitle=IEEE%20access&rft.au=Mirza,%20Behroz&rft.date=2021&rft.volume=9&rft.spage=55879&rft.epage=55897&rft.pages=55879-55897&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3071389&rft_dat=%3Cproquest_cross%3E2513396185%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2513396185&rft_id=info:pmid/&rft_ieee_id=9395632&rft_doaj_id=oai_doaj_org_article_4a9d07c51a75479aac627072136355ef&rfr_iscdi=true |