Identifying and Correcting Label Bias in Machine Learning

Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jiang, Heinrich, Nachum, Ofir
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Jiang, Heinrich Nachum, Ofir
description	Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification.
doi_str_mv	10.48550/arxiv.1901.04966
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1901_04966</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1901_04966</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-5583e8b04da7743f4218b2c72fca06e27230135bffa0780692c69d0c01451d443</originalsourceid><addsrcrecordid>eNotj8tqAjEYhbNxIdoHcGVeYKZ_7snSDrYVpnTjfvgnFw1oLFGKvn2r7epwOPBxPkIWDFpplYJnrNf83TIHrAXptJ4StwmxXHK65bKjWALtTrVGf7nXHsd4oC8ZzzQX-oF-n0ukfcRafuc5mSQ8nOPTf87I9nW97d6b_vNt0636BrXRjVJWRDuCDGiMFElyZkfuDU8eQUduuAAm1JgSgrGgHffaBfDApGJBSjEjyz_s4_vwVfMR6224OwwPB_EDP7k_gg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Identifying and Correcting Label Bias in Machine Learning</title><source>arXiv.org</source><creator>Jiang, Heinrich ; Nachum, Ofir</creator><creatorcontrib>Jiang, Heinrich ; Nachum, Ofir</creatorcontrib><description>Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification.</description><identifier>DOI: 10.48550/arxiv.1901.04966</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2019-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1901.04966$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1901.04966$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Heinrich</creatorcontrib><creatorcontrib>Nachum, Ofir</creatorcontrib><title>Identifying and Correcting Label Bias in Machine Learning</title><description>Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqAjEYhbNxIdoHcGVeYKZ_7snSDrYVpnTjfvgnFw1oLFGKvn2r7epwOPBxPkIWDFpplYJnrNf83TIHrAXptJ4StwmxXHK65bKjWALtTrVGf7nXHsd4oC8ZzzQX-oF-n0ukfcRafuc5mSQ8nOPTf87I9nW97d6b_vNt0636BrXRjVJWRDuCDGiMFElyZkfuDU8eQUduuAAm1JgSgrGgHffaBfDApGJBSjEjyz_s4_vwVfMR6224OwwPB_EDP7k_gg</recordid><startdate>20190115</startdate><enddate>20190115</enddate><creator>Jiang, Heinrich</creator><creator>Nachum, Ofir</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190115</creationdate><title>Identifying and Correcting Label Bias in Machine Learning</title><author>Jiang, Heinrich ; Nachum, Ofir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-5583e8b04da7743f4218b2c72fca06e27230135bffa0780692c69d0c01451d443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Heinrich</creatorcontrib><creatorcontrib>Nachum, Ofir</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Heinrich</au><au>Nachum, Ofir</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying and Correcting Label Bias in Machine Learning</atitle><date>2019-01-15</date><risdate>2019</risdate><abstract>Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification.</abstract><doi>10.48550/arxiv.1901.04966</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1901.04966
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1901_04966
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
title	Identifying and Correcting Label Bias in Machine Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T10%3A10%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20and%20Correcting%20Label%20Bias%20in%20Machine%20Learning&rft.au=Jiang,%20Heinrich&rft.date=2019-01-15&rft_id=info:doi/10.48550/arxiv.1901.04966&rft_dat=%3Carxiv_GOX%3E1901_04966%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true