First-Person Fairness in Chatbots
Chatbots like ChatGPT are used for diverse purposes, ranging from resume writing to entertainment. These real-world applications are different from the institutional uses, such as resume screening or credit scoring, which have been the focus of much of AI research on fairness. Ensuring equitable tre...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Eloundou, Tyna Beutel, Alex Robinson, David G Gu-Lemberg, Keren Brakman, Anna-Luisa Mishkin, Pamela Shah, Meghan Heidecke, Johannes Weng, Lilian Kalai, Adam Tauman |
description | Chatbots like ChatGPT are used for diverse purposes, ranging from resume
writing to entertainment. These real-world applications are different from the
institutional uses, such as resume screening or credit scoring, which have been
the focus of much of AI research on fairness. Ensuring equitable treatment for
all users in these first-person contexts is critical. In this work, we study
"first-person fairness," which means fairness toward the chatbot user. This
includes providing high-quality responses to all users regardless of their
identity or background and avoiding harmful stereotypes.
We propose a scalable, privacy-preserving method for evaluating one aspect of
first-person fairness across a large, heterogeneous corpus of real-world
chatbot interactions. Specifically, we assess potential bias linked to users'
names, which can serve as proxies for demographic attributes like gender or
race, in chatbot systems such as ChatGPT, which provide mechanisms for storing
and using user names. Our method leverages a second language model to privately
analyze name-sensitivity in the chatbot's responses. We verify the validity of
these annotations through independent human evaluation. Further, we show that
post-training interventions, including RL, significantly mitigate harmful
stereotypes.
Our approach also yields succinct descriptions of response differences across
tasks. For instance, in the "writing a story" task, chatbot responses show a
tendency to create protagonists whose gender matches the likely gender inferred
from the user's name. Moreover, a pattern emerges where users with
female-associated names receive responses with friendlier and simpler language
slightly more often than users with male-associated names. Finally, we provide
the system messages required for external researchers to further investigate
ChatGPT's behavior with hypothetical user profiles. |
doi_str_mv | 10.48550/arxiv.2410.19803 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_19803</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_19803</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_198033</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGFpaGBhzMii6ZRYVl-gGpBYV5-cpuCVmFuWlFhcrZOYpOGckliTllxTzMLCmJeYUp_JCaW4GeTfXEGcPXbBh8QVFmbmJRZXxIEPjwYYaE1YBABmxKv8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>First-Person Fairness in Chatbots</title><source>arXiv.org</source><creator>Eloundou, Tyna ; Beutel, Alex ; Robinson, David G ; Gu-Lemberg, Keren ; Brakman, Anna-Luisa ; Mishkin, Pamela ; Shah, Meghan ; Heidecke, Johannes ; Weng, Lilian ; Kalai, Adam Tauman</creator><creatorcontrib>Eloundou, Tyna ; Beutel, Alex ; Robinson, David G ; Gu-Lemberg, Keren ; Brakman, Anna-Luisa ; Mishkin, Pamela ; Shah, Meghan ; Heidecke, Johannes ; Weng, Lilian ; Kalai, Adam Tauman</creatorcontrib><description>Chatbots like ChatGPT are used for diverse purposes, ranging from resume
writing to entertainment. These real-world applications are different from the
institutional uses, such as resume screening or credit scoring, which have been
the focus of much of AI research on fairness. Ensuring equitable treatment for
all users in these first-person contexts is critical. In this work, we study
"first-person fairness," which means fairness toward the chatbot user. This
includes providing high-quality responses to all users regardless of their
identity or background and avoiding harmful stereotypes.
We propose a scalable, privacy-preserving method for evaluating one aspect of
first-person fairness across a large, heterogeneous corpus of real-world
chatbot interactions. Specifically, we assess potential bias linked to users'
names, which can serve as proxies for demographic attributes like gender or
race, in chatbot systems such as ChatGPT, which provide mechanisms for storing
and using user names. Our method leverages a second language model to privately
analyze name-sensitivity in the chatbot's responses. We verify the validity of
these annotations through independent human evaluation. Further, we show that
post-training interventions, including RL, significantly mitigate harmful
stereotypes.
Our approach also yields succinct descriptions of response differences across
tasks. For instance, in the "writing a story" task, chatbot responses show a
tendency to create protagonists whose gender matches the likely gender inferred
from the user's name. Moreover, a pattern emerges where users with
female-associated names receive responses with friendlier and simpler language
slightly more often than users with male-associated names. Finally, we provide
the system messages required for external researchers to further investigate
ChatGPT's behavior with hypothetical user profiles.</description><identifier>DOI: 10.48550/arxiv.2410.19803</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Computers and Society</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.19803$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.19803$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Eloundou, Tyna</creatorcontrib><creatorcontrib>Beutel, Alex</creatorcontrib><creatorcontrib>Robinson, David G</creatorcontrib><creatorcontrib>Gu-Lemberg, Keren</creatorcontrib><creatorcontrib>Brakman, Anna-Luisa</creatorcontrib><creatorcontrib>Mishkin, Pamela</creatorcontrib><creatorcontrib>Shah, Meghan</creatorcontrib><creatorcontrib>Heidecke, Johannes</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Kalai, Adam Tauman</creatorcontrib><title>First-Person Fairness in Chatbots</title><description>Chatbots like ChatGPT are used for diverse purposes, ranging from resume
writing to entertainment. These real-world applications are different from the
institutional uses, such as resume screening or credit scoring, which have been
the focus of much of AI research on fairness. Ensuring equitable treatment for
all users in these first-person contexts is critical. In this work, we study
"first-person fairness," which means fairness toward the chatbot user. This
includes providing high-quality responses to all users regardless of their
identity or background and avoiding harmful stereotypes.
We propose a scalable, privacy-preserving method for evaluating one aspect of
first-person fairness across a large, heterogeneous corpus of real-world
chatbot interactions. Specifically, we assess potential bias linked to users'
names, which can serve as proxies for demographic attributes like gender or
race, in chatbot systems such as ChatGPT, which provide mechanisms for storing
and using user names. Our method leverages a second language model to privately
analyze name-sensitivity in the chatbot's responses. We verify the validity of
these annotations through independent human evaluation. Further, we show that
post-training interventions, including RL, significantly mitigate harmful
stereotypes.
Our approach also yields succinct descriptions of response differences across
tasks. For instance, in the "writing a story" task, chatbot responses show a
tendency to create protagonists whose gender matches the likely gender inferred
from the user's name. Moreover, a pattern emerges where users with
female-associated names receive responses with friendlier and simpler language
slightly more often than users with male-associated names. Finally, we provide
the system messages required for external researchers to further investigate
ChatGPT's behavior with hypothetical user profiles.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computers and Society</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGFpaGBhzMii6ZRYVl-gGpBYV5-cpuCVmFuWlFhcrZOYpOGckliTllxTzMLCmJeYUp_JCaW4GeTfXEGcPXbBh8QVFmbmJRZXxIEPjwYYaE1YBABmxKv8</recordid><startdate>20241016</startdate><enddate>20241016</enddate><creator>Eloundou, Tyna</creator><creator>Beutel, Alex</creator><creator>Robinson, David G</creator><creator>Gu-Lemberg, Keren</creator><creator>Brakman, Anna-Luisa</creator><creator>Mishkin, Pamela</creator><creator>Shah, Meghan</creator><creator>Heidecke, Johannes</creator><creator>Weng, Lilian</creator><creator>Kalai, Adam Tauman</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241016</creationdate><title>First-Person Fairness in Chatbots</title><author>Eloundou, Tyna ; Beutel, Alex ; Robinson, David G ; Gu-Lemberg, Keren ; Brakman, Anna-Luisa ; Mishkin, Pamela ; Shah, Meghan ; Heidecke, Johannes ; Weng, Lilian ; Kalai, Adam Tauman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_198033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computers and Society</topic><toplevel>online_resources</toplevel><creatorcontrib>Eloundou, Tyna</creatorcontrib><creatorcontrib>Beutel, Alex</creatorcontrib><creatorcontrib>Robinson, David G</creatorcontrib><creatorcontrib>Gu-Lemberg, Keren</creatorcontrib><creatorcontrib>Brakman, Anna-Luisa</creatorcontrib><creatorcontrib>Mishkin, Pamela</creatorcontrib><creatorcontrib>Shah, Meghan</creatorcontrib><creatorcontrib>Heidecke, Johannes</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Kalai, Adam Tauman</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Eloundou, Tyna</au><au>Beutel, Alex</au><au>Robinson, David G</au><au>Gu-Lemberg, Keren</au><au>Brakman, Anna-Luisa</au><au>Mishkin, Pamela</au><au>Shah, Meghan</au><au>Heidecke, Johannes</au><au>Weng, Lilian</au><au>Kalai, Adam Tauman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>First-Person Fairness in Chatbots</atitle><date>2024-10-16</date><risdate>2024</risdate><abstract>Chatbots like ChatGPT are used for diverse purposes, ranging from resume
writing to entertainment. These real-world applications are different from the
institutional uses, such as resume screening or credit scoring, which have been
the focus of much of AI research on fairness. Ensuring equitable treatment for
all users in these first-person contexts is critical. In this work, we study
"first-person fairness," which means fairness toward the chatbot user. This
includes providing high-quality responses to all users regardless of their
identity or background and avoiding harmful stereotypes.
We propose a scalable, privacy-preserving method for evaluating one aspect of
first-person fairness across a large, heterogeneous corpus of real-world
chatbot interactions. Specifically, we assess potential bias linked to users'
names, which can serve as proxies for demographic attributes like gender or
race, in chatbot systems such as ChatGPT, which provide mechanisms for storing
and using user names. Our method leverages a second language model to privately
analyze name-sensitivity in the chatbot's responses. We verify the validity of
these annotations through independent human evaluation. Further, we show that
post-training interventions, including RL, significantly mitigate harmful
stereotypes.
Our approach also yields succinct descriptions of response differences across
tasks. For instance, in the "writing a story" task, chatbot responses show a
tendency to create protagonists whose gender matches the likely gender inferred
from the user's name. Moreover, a pattern emerges where users with
female-associated names receive responses with friendlier and simpler language
slightly more often than users with male-associated names. Finally, we provide
the system messages required for external researchers to further investigate
ChatGPT's behavior with hypothetical user profiles.</abstract><doi>10.48550/arxiv.2410.19803</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.19803 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_19803 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computers and Society |
title | First-Person Fairness in Chatbots |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T15%3A44%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=First-Person%20Fairness%20in%20Chatbots&rft.au=Eloundou,%20Tyna&rft.date=2024-10-16&rft_id=info:doi/10.48550/arxiv.2410.19803&rft_dat=%3Carxiv_GOX%3E2410_19803%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |