Human-Guided Fair Classification for Natural Language Processing

Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-03
Hauptverfasser: Dorner, Florian E, Peychev, Momchil, Konstantinov, Nikola, Goel, Naman, Ash, Elliott, Vechev, Martin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Dorner, Florian E
Peychev, Momchil
Konstantinov, Nikola
Goel, Naman
Ash, Elliott
Vechev, Martin
description Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2756548088</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2756548088</sourcerecordid><originalsourceid>FETCH-proquest_journals_27565480883</originalsourceid><addsrcrecordid>eNqNysEKwiAcgHEJgkbtHYTOguncPAajtUNEh-7jz6biMC2d798OPUCn7_D9NqhgnJ-IrBjboTKlmVLK6oYJwQt07vMLPLlmO6kJd2Ajbh2kZLUdYbHBYx0ivsOSIzh8A28yGIUfMYxqVd4c0FaDS6r8dY-O3eXZ9uQdwyertAxzyNGva2CNqEUlqZT8P_UFF6A49A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2756548088</pqid></control><display><type>article</type><title>Human-Guided Fair Classification for Natural Language Processing</title><source>Free E- Journals</source><creator>Dorner, Florian E ; Peychev, Momchil ; Konstantinov, Nikola ; Goel, Naman ; Ash, Elliott ; Vechev, Martin</creator><creatorcontrib>Dorner, Florian E ; Peychev, Momchil ; Konstantinov, Nikola ; Goel, Naman ; Ash, Elliott ; Vechev, Martin</creatorcontrib><description>Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Classifiers ; Natural language processing ; Perturbation ; Sentences ; Similarity ; Specifications</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Dorner, Florian E</creatorcontrib><creatorcontrib>Peychev, Momchil</creatorcontrib><creatorcontrib>Konstantinov, Nikola</creatorcontrib><creatorcontrib>Goel, Naman</creatorcontrib><creatorcontrib>Ash, Elliott</creatorcontrib><creatorcontrib>Vechev, Martin</creatorcontrib><title>Human-Guided Fair Classification for Natural Language Processing</title><title>arXiv.org</title><description>Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.</description><subject>Classification</subject><subject>Classifiers</subject><subject>Natural language processing</subject><subject>Perturbation</subject><subject>Sentences</subject><subject>Similarity</subject><subject>Specifications</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNysEKwiAcgHEJgkbtHYTOguncPAajtUNEh-7jz6biMC2d798OPUCn7_D9NqhgnJ-IrBjboTKlmVLK6oYJwQt07vMLPLlmO6kJd2Ajbh2kZLUdYbHBYx0ivsOSIzh8A28yGIUfMYxqVd4c0FaDS6r8dY-O3eXZ9uQdwyertAxzyNGva2CNqEUlqZT8P_UFF6A49A</recordid><startdate>20230316</startdate><enddate>20230316</enddate><creator>Dorner, Florian E</creator><creator>Peychev, Momchil</creator><creator>Konstantinov, Nikola</creator><creator>Goel, Naman</creator><creator>Ash, Elliott</creator><creator>Vechev, Martin</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230316</creationdate><title>Human-Guided Fair Classification for Natural Language Processing</title><author>Dorner, Florian E ; Peychev, Momchil ; Konstantinov, Nikola ; Goel, Naman ; Ash, Elliott ; Vechev, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27565480883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Classifiers</topic><topic>Natural language processing</topic><topic>Perturbation</topic><topic>Sentences</topic><topic>Similarity</topic><topic>Specifications</topic><toplevel>online_resources</toplevel><creatorcontrib>Dorner, Florian E</creatorcontrib><creatorcontrib>Peychev, Momchil</creatorcontrib><creatorcontrib>Konstantinov, Nikola</creatorcontrib><creatorcontrib>Goel, Naman</creatorcontrib><creatorcontrib>Ash, Elliott</creatorcontrib><creatorcontrib>Vechev, Martin</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dorner, Florian E</au><au>Peychev, Momchil</au><au>Konstantinov, Nikola</au><au>Goel, Naman</au><au>Ash, Elliott</au><au>Vechev, Martin</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Human-Guided Fair Classification for Natural Language Processing</atitle><jtitle>arXiv.org</jtitle><date>2023-03-16</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2756548088
source Free E- Journals
subjects Classification
Classifiers
Natural language processing
Perturbation
Sentences
Similarity
Specifications
title Human-Guided Fair Classification for Natural Language Processing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T00%3A10%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Human-Guided%20Fair%20Classification%20for%20Natural%20Language%20Processing&rft.jtitle=arXiv.org&rft.au=Dorner,%20Florian%20E&rft.date=2023-03-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2756548088%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2756548088&rft_id=info:pmid/&rfr_iscdi=true