On the cost of essentially fair clusterings

Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable cl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bercea, Ioana O, Groß, Martin, Khuller, Samir, Kumar, Aounon, Rösner, Clemens, Schmidt, Daniel R, Schmidt, Melanie
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Data Structures and Algorithms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Bercea, Ioana O Groß, Martin Khuller, Samir Kumar, Aounon Rösner, Clemens Schmidt, Daniel R Schmidt, Melanie
description	Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair $k$-center problem and a $O(t)$-approximation for the fair $k$-median problem, where $t$ is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair $k$-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair $k$-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives $k$-center, $k$-supplier, $k$-median, $k$-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.
doi_str_mv	10.48550/arxiv.1811.10319
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1811_10319</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1811_10319</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-883e865bafbfe3285f3dc9891a28fcc3118ff185128106ae434d65d5246f5bb73</originalsourceid><addsrcrecordid>eNotzrtqwzAUgGEtHUqaB-gU7cWOj2Qpx2MJuUHAi3dzLOu0AtcpkluStw-5TP_28wnxDkVeojHFkuI5_OeAADkUGqpX8VGPcvr20p3SJE8sfUp-nAINw0UyhSjd8JcmH8P4ld7EC9OQ_PzZmWi2m2a9z4717rD-PGZkV1WGqD1a0xF37LVCw7p3FVZACtk5DYDMgAYUQmHJl7rsremNKi2brlvpmVg8tndt-xvDD8VLe1O3d7W-ArO8O94</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On the cost of essentially fair clusterings</title><source>arXiv.org</source><creator>Bercea, Ioana O ; Groß, Martin ; Khuller, Samir ; Kumar, Aounon ; Rösner, Clemens ; Schmidt, Daniel R ; Schmidt, Melanie</creator><creatorcontrib>Bercea, Ioana O ; Groß, Martin ; Khuller, Samir ; Kumar, Aounon ; Rösner, Clemens ; Schmidt, Daniel R ; Schmidt, Melanie</creatorcontrib><description>Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair $k$-center problem and a $O(t)$-approximation for the fair $k$-median problem, where $t$ is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair $k$-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair $k$-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives $k$-center, $k$-supplier, $k$-median, $k$-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.</description><identifier>DOI: 10.48550/arxiv.1811.10319</identifier><language>eng</language><subject>Computer Science - Data Structures and Algorithms</subject><creationdate>2018-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1811.10319$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1811.10319$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bercea, Ioana O</creatorcontrib><creatorcontrib>Groß, Martin</creatorcontrib><creatorcontrib>Khuller, Samir</creatorcontrib><creatorcontrib>Kumar, Aounon</creatorcontrib><creatorcontrib>Rösner, Clemens</creatorcontrib><creatorcontrib>Schmidt, Daniel R</creatorcontrib><creatorcontrib>Schmidt, Melanie</creatorcontrib><title>On the cost of essentially fair clusterings</title><description>Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair $k$-center problem and a $O(t)$-approximation for the fair $k$-median problem, where $t$ is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair $k$-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair $k$-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives $k$-center, $k$-supplier, $k$-median, $k$-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.</description><subject>Computer Science - Data Structures and Algorithms</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrtqwzAUgGEtHUqaB-gU7cWOj2Qpx2MJuUHAi3dzLOu0AtcpkluStw-5TP_28wnxDkVeojHFkuI5_OeAADkUGqpX8VGPcvr20p3SJE8sfUp-nAINw0UyhSjd8JcmH8P4ld7EC9OQ_PzZmWi2m2a9z4717rD-PGZkV1WGqD1a0xF37LVCw7p3FVZACtk5DYDMgAYUQmHJl7rsremNKi2brlvpmVg8tndt-xvDD8VLe1O3d7W-ArO8O94</recordid><startdate>20181126</startdate><enddate>20181126</enddate><creator>Bercea, Ioana O</creator><creator>Groß, Martin</creator><creator>Khuller, Samir</creator><creator>Kumar, Aounon</creator><creator>Rösner, Clemens</creator><creator>Schmidt, Daniel R</creator><creator>Schmidt, Melanie</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20181126</creationdate><title>On the cost of essentially fair clusterings</title><author>Bercea, Ioana O ; Groß, Martin ; Khuller, Samir ; Kumar, Aounon ; Rösner, Clemens ; Schmidt, Daniel R ; Schmidt, Melanie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-883e865bafbfe3285f3dc9891a28fcc3118ff185128106ae434d65d5246f5bb73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Data Structures and Algorithms</topic><toplevel>online_resources</toplevel><creatorcontrib>Bercea, Ioana O</creatorcontrib><creatorcontrib>Groß, Martin</creatorcontrib><creatorcontrib>Khuller, Samir</creatorcontrib><creatorcontrib>Kumar, Aounon</creatorcontrib><creatorcontrib>Rösner, Clemens</creatorcontrib><creatorcontrib>Schmidt, Daniel R</creatorcontrib><creatorcontrib>Schmidt, Melanie</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bercea, Ioana O</au><au>Groß, Martin</au><au>Khuller, Samir</au><au>Kumar, Aounon</au><au>Rösner, Clemens</au><au>Schmidt, Daniel R</au><au>Schmidt, Melanie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On the cost of essentially fair clusterings</atitle><date>2018-11-26</date><risdate>2018</risdate><abstract>Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair $k$-center problem and a $O(t)$-approximation for the fair $k$-median problem, where $t$ is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair $k$-center. We extend and improve the known results. Firstly, we give a 5-approximation for the fair $k$-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives $k$-center, $k$-supplier, $k$-median, $k$-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.</abstract><doi>10.48550/arxiv.1811.10319</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1811.10319
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1811_10319
source	arXiv.org
subjects	Computer Science - Data Structures and Algorithms
title	On the cost of essentially fair clusterings
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T13%3A52%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20the%20cost%20of%20essentially%20fair%20clusterings&rft.au=Bercea,%20Ioana%20O&rft.date=2018-11-26&rft_id=info:doi/10.48550/arxiv.1811.10319&rft_dat=%3Carxiv_GOX%3E1811_10319%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true