Identifying and Correcting Label Bias in Machine Learning
Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Jiang, Heinrich Nachum, Ofir |
description | Datasets often contain biases which unfairly disadvantage certain groups, and
classifiers trained on such datasets can inherit these biases. In this paper,
we provide a mathematical formulation of how this bias can arise. We do so by
assuming the existence of underlying, unknown, and unbiased labels which are
overwritten by an agent who intends to provide accurate labels but may have
biases against certain groups. Despite the fact that we only observe the biased
labels, we are able to show that the bias may nevertheless be corrected by
re-weighting the data points without changing the labels. We show, with
theoretical guarantees, that training on the re-weighted dataset corresponds to
training on the unobserved but unbiased labels, thus leading to an unbiased
machine learning classifier. Our procedure is fast and robust and can be used
with virtually any learning algorithm. We evaluate on a number of standard
machine learning fairness datasets and a variety of fairness notions, finding
that our method outperforms standard approaches in achieving fair
classification. |
doi_str_mv | 10.48550/arxiv.1901.04966 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1901_04966</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1901_04966</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-5583e8b04da7743f4218b2c72fca06e27230135bffa0780692c69d0c01451d443</originalsourceid><addsrcrecordid>eNotj8tqAjEYhbNxIdoHcGVeYKZ_7snSDrYVpnTjfvgnFw1oLFGKvn2r7epwOPBxPkIWDFpplYJnrNf83TIHrAXptJ4StwmxXHK65bKjWALtTrVGf7nXHsd4oC8ZzzQX-oF-n0ukfcRafuc5mSQ8nOPTf87I9nW97d6b_vNt0636BrXRjVJWRDuCDGiMFElyZkfuDU8eQUduuAAm1JgSgrGgHffaBfDApGJBSjEjyz_s4_vwVfMR6224OwwPB_EDP7k_gg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Identifying and Correcting Label Bias in Machine Learning</title><source>arXiv.org</source><creator>Jiang, Heinrich ; Nachum, Ofir</creator><creatorcontrib>Jiang, Heinrich ; Nachum, Ofir</creatorcontrib><description>Datasets often contain biases which unfairly disadvantage certain groups, and
classifiers trained on such datasets can inherit these biases. In this paper,
we provide a mathematical formulation of how this bias can arise. We do so by
assuming the existence of underlying, unknown, and unbiased labels which are
overwritten by an agent who intends to provide accurate labels but may have
biases against certain groups. Despite the fact that we only observe the biased
labels, we are able to show that the bias may nevertheless be corrected by
re-weighting the data points without changing the labels. We show, with
theoretical guarantees, that training on the re-weighted dataset corresponds to
training on the unobserved but unbiased labels, thus leading to an unbiased
machine learning classifier. Our procedure is fast and robust and can be used
with virtually any learning algorithm. We evaluate on a number of standard
machine learning fairness datasets and a variety of fairness notions, finding
that our method outperforms standard approaches in achieving fair
classification.</description><identifier>DOI: 10.48550/arxiv.1901.04966</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2019-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1901.04966$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1901.04966$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Heinrich</creatorcontrib><creatorcontrib>Nachum, Ofir</creatorcontrib><title>Identifying and Correcting Label Bias in Machine Learning</title><description>Datasets often contain biases which unfairly disadvantage certain groups, and
classifiers trained on such datasets can inherit these biases. In this paper,
we provide a mathematical formulation of how this bias can arise. We do so by
assuming the existence of underlying, unknown, and unbiased labels which are
overwritten by an agent who intends to provide accurate labels but may have
biases against certain groups. Despite the fact that we only observe the biased
labels, we are able to show that the bias may nevertheless be corrected by
re-weighting the data points without changing the labels. We show, with
theoretical guarantees, that training on the re-weighted dataset corresponds to
training on the unobserved but unbiased labels, thus leading to an unbiased
machine learning classifier. Our procedure is fast and robust and can be used
with virtually any learning algorithm. We evaluate on a number of standard
machine learning fairness datasets and a variety of fairness notions, finding
that our method outperforms standard approaches in achieving fair
classification.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqAjEYhbNxIdoHcGVeYKZ_7snSDrYVpnTjfvgnFw1oLFGKvn2r7epwOPBxPkIWDFpplYJnrNf83TIHrAXptJ4StwmxXHK65bKjWALtTrVGf7nXHsd4oC8ZzzQX-oF-n0ukfcRafuc5mSQ8nOPTf87I9nW97d6b_vNt0636BrXRjVJWRDuCDGiMFElyZkfuDU8eQUduuAAm1JgSgrGgHffaBfDApGJBSjEjyz_s4_vwVfMR6224OwwPB_EDP7k_gg</recordid><startdate>20190115</startdate><enddate>20190115</enddate><creator>Jiang, Heinrich</creator><creator>Nachum, Ofir</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190115</creationdate><title>Identifying and Correcting Label Bias in Machine Learning</title><author>Jiang, Heinrich ; Nachum, Ofir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-5583e8b04da7743f4218b2c72fca06e27230135bffa0780692c69d0c01451d443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Heinrich</creatorcontrib><creatorcontrib>Nachum, Ofir</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Heinrich</au><au>Nachum, Ofir</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying and Correcting Label Bias in Machine Learning</atitle><date>2019-01-15</date><risdate>2019</risdate><abstract>Datasets often contain biases which unfairly disadvantage certain groups, and
classifiers trained on such datasets can inherit these biases. In this paper,
we provide a mathematical formulation of how this bias can arise. We do so by
assuming the existence of underlying, unknown, and unbiased labels which are
overwritten by an agent who intends to provide accurate labels but may have
biases against certain groups. Despite the fact that we only observe the biased
labels, we are able to show that the bias may nevertheless be corrected by
re-weighting the data points without changing the labels. We show, with
theoretical guarantees, that training on the re-weighted dataset corresponds to
training on the unobserved but unbiased labels, thus leading to an unbiased
machine learning classifier. Our procedure is fast and robust and can be used
with virtually any learning algorithm. We evaluate on a number of standard
machine learning fairness datasets and a variety of fairness notions, finding
that our method outperforms standard approaches in achieving fair
classification.</abstract><doi>10.48550/arxiv.1901.04966</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1901.04966 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1901_04966 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning |
title | Identifying and Correcting Label Bias in Machine Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T10%3A10%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20and%20Correcting%20Label%20Bias%20in%20Machine%20Learning&rft.au=Jiang,%20Heinrich&rft.date=2019-01-15&rft_id=info:doi/10.48550/arxiv.1901.04966&rft_dat=%3Carxiv_GOX%3E1901_04966%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |