Human-Guided Fair Classification for Natural Language Processing
Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-03 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Dorner, Florian E Peychev, Momchil Konstantinov, Nikola Goel, Naman Ash, Elliott Vechev, Martin |
description | Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2756548088</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2756548088</sourcerecordid><originalsourceid>FETCH-proquest_journals_27565480883</originalsourceid><addsrcrecordid>eNqNysEKwiAcgHEJgkbtHYTOguncPAajtUNEh-7jz6biMC2d798OPUCn7_D9NqhgnJ-IrBjboTKlmVLK6oYJwQt07vMLPLlmO6kJd2Ajbh2kZLUdYbHBYx0ivsOSIzh8A28yGIUfMYxqVd4c0FaDS6r8dY-O3eXZ9uQdwyertAxzyNGva2CNqEUlqZT8P_UFF6A49A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2756548088</pqid></control><display><type>article</type><title>Human-Guided Fair Classification for Natural Language Processing</title><source>Free E- Journals</source><creator>Dorner, Florian E ; Peychev, Momchil ; Konstantinov, Nikola ; Goel, Naman ; Ash, Elliott ; Vechev, Martin</creator><creatorcontrib>Dorner, Florian E ; Peychev, Momchil ; Konstantinov, Nikola ; Goel, Naman ; Ash, Elliott ; Vechev, Martin</creatorcontrib><description>Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Classifiers ; Natural language processing ; Perturbation ; Sentences ; Similarity ; Specifications</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Dorner, Florian E</creatorcontrib><creatorcontrib>Peychev, Momchil</creatorcontrib><creatorcontrib>Konstantinov, Nikola</creatorcontrib><creatorcontrib>Goel, Naman</creatorcontrib><creatorcontrib>Ash, Elliott</creatorcontrib><creatorcontrib>Vechev, Martin</creatorcontrib><title>Human-Guided Fair Classification for Natural Language Processing</title><title>arXiv.org</title><description>Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.</description><subject>Classification</subject><subject>Classifiers</subject><subject>Natural language processing</subject><subject>Perturbation</subject><subject>Sentences</subject><subject>Similarity</subject><subject>Specifications</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNysEKwiAcgHEJgkbtHYTOguncPAajtUNEh-7jz6biMC2d798OPUCn7_D9NqhgnJ-IrBjboTKlmVLK6oYJwQt07vMLPLlmO6kJd2Ajbh2kZLUdYbHBYx0ivsOSIzh8A28yGIUfMYxqVd4c0FaDS6r8dY-O3eXZ9uQdwyertAxzyNGva2CNqEUlqZT8P_UFF6A49A</recordid><startdate>20230316</startdate><enddate>20230316</enddate><creator>Dorner, Florian E</creator><creator>Peychev, Momchil</creator><creator>Konstantinov, Nikola</creator><creator>Goel, Naman</creator><creator>Ash, Elliott</creator><creator>Vechev, Martin</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230316</creationdate><title>Human-Guided Fair Classification for Natural Language Processing</title><author>Dorner, Florian E ; Peychev, Momchil ; Konstantinov, Nikola ; Goel, Naman ; Ash, Elliott ; Vechev, Martin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27565480883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>Classifiers</topic><topic>Natural language processing</topic><topic>Perturbation</topic><topic>Sentences</topic><topic>Similarity</topic><topic>Specifications</topic><toplevel>online_resources</toplevel><creatorcontrib>Dorner, Florian E</creatorcontrib><creatorcontrib>Peychev, Momchil</creatorcontrib><creatorcontrib>Konstantinov, Nikola</creatorcontrib><creatorcontrib>Goel, Naman</creatorcontrib><creatorcontrib>Ash, Elliott</creatorcontrib><creatorcontrib>Vechev, Martin</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dorner, Florian E</au><au>Peychev, Momchil</au><au>Konstantinov, Nikola</au><au>Goel, Naman</au><au>Ash, Elliott</au><au>Vechev, Martin</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Human-Guided Fair Classification for Natural Language Processing</atitle><jtitle>arXiv.org</jtitle><date>2023-03-16</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-03 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2756548088 |
source | Free E- Journals |
subjects | Classification Classifiers Natural language processing Perturbation Sentences Similarity Specifications |
title | Human-Guided Fair Classification for Natural Language Processing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T00%3A10%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Human-Guided%20Fair%20Classification%20for%20Natural%20Language%20Processing&rft.jtitle=arXiv.org&rft.au=Dorner,%20Florian%20E&rft.date=2023-03-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2756548088%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2756548088&rft_id=info:pmid/&rfr_iscdi=true |