GeniL: A Multilingual Dataset on Generalizing Language

Generative language models are transforming our digital ecosystem, but they often inherit societal biases, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Davani, Aida Mostafazadeh, Gubbi, Sagar, Dev, Sunipa, Dave, Shachi, Prabhakaran, Vinodkumar
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Davani, Aida Mostafazadeh Gubbi, Sagar Dev, Sunipa Dave, Shachi Prabhakaran, Vinodkumar
description	Generative language models are transforming our digital ecosystem, but they often inherit societal biases, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effectively detect instances of stereotype perpetuation is a crucial first step. Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures, without accounting for the variety of sentential contexts they manifest in. We argue that understanding the sentential context is crucial for detecting instances of generalization. We distinguish two types of generalizations: (1) language that merely mentions the presence of a generalization ("people think the French are very rude"), and (2) language that reinforces such a generalization ("as French they must be rude"), from non-generalizing context ("My French friends think I am rude"). For meaningful stereotype evaluations, we need to reliably distinguish such instances of generalizations. We introduce the new task of detecting generalization in language, and build GeniL, a multilingual dataset of over 50K sentences from 9 languages (English, Arabic, Bengali, Spanish, French, Hindi, Indonesian, Malay, and Portuguese) annotated for instances of generalizations. We demonstrate that the likelihood of a co-occurrence being an instance of generalization is usually low, and varies across different languages, identity groups, and attributes. We build classifiers to detect generalization in language with an overall PR-AUC of 58.7, with varying degrees of performance across languages. Our research provides data and tools to enable a nuanced understanding of stereotype perpetuation, a crucial step towards more inclusive and responsible language technologies.
doi_str_mv	10.48550/arxiv.2404.05866
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_05866</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_05866</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-c820f104dfd6a7b69027e129d9272b47eccb91b7b7f044782ca7f200fa1d6d973</originalsourceid><addsrcrecordid>eNotj7FuwjAURb0wVNAP6IR_IOHZOH4xG4KWIgWxsEfPsY0smYBCqApfX6BMdzhXV_cw9iEgV2VRwIS63_iTSwUqh6LU-o3plW9jNeNzvrmkPqbY7i-U-JJ6OvueH1t-L_iOUrzdEa_owfd-xAaB0tm_v3LIdl-fu8V3Vm1X68W8ykijzppSQhCgXHCa0GoDEr2QxhmJ0ir0TWONsGgxgFJYyoYwSIBAwmlncDpk4__Z5_H61MUDddf6IVA_BaZ_gQQ_bw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GeniL: A Multilingual Dataset on Generalizing Language</title><source>arXiv.org</source><creator>Davani, Aida Mostafazadeh ; Gubbi, Sagar ; Dev, Sunipa ; Dave, Shachi ; Prabhakaran, Vinodkumar</creator><creatorcontrib>Davani, Aida Mostafazadeh ; Gubbi, Sagar ; Dev, Sunipa ; Dave, Shachi ; Prabhakaran, Vinodkumar</creatorcontrib><description>Generative language models are transforming our digital ecosystem, but they often inherit societal biases, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effectively detect instances of stereotype perpetuation is a crucial first step. Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures, without accounting for the variety of sentential contexts they manifest in. We argue that understanding the sentential context is crucial for detecting instances of generalization. We distinguish two types of generalizations: (1) language that merely mentions the presence of a generalization ("people think the French are very rude"), and (2) language that reinforces such a generalization ("as French they must be rude"), from non-generalizing context ("My French friends think I am rude"). For meaningful stereotype evaluations, we need to reliably distinguish such instances of generalizations. We introduce the new task of detecting generalization in language, and build GeniL, a multilingual dataset of over 50K sentences from 9 languages (English, Arabic, Bengali, Spanish, French, Hindi, Indonesian, Malay, and Portuguese) annotated for instances of generalizations. We demonstrate that the likelihood of a co-occurrence being an instance of generalization is usually low, and varies across different languages, identity groups, and attributes. We build classifiers to detect generalization in language with an overall PR-AUC of 58.7, with varying degrees of performance across languages. Our research provides data and tools to enable a nuanced understanding of stereotype perpetuation, a crucial step towards more inclusive and responsible language technologies.</description><identifier>DOI: 10.48550/arxiv.2404.05866</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.05866$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.05866$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Davani, Aida Mostafazadeh</creatorcontrib><creatorcontrib>Gubbi, Sagar</creatorcontrib><creatorcontrib>Dev, Sunipa</creatorcontrib><creatorcontrib>Dave, Shachi</creatorcontrib><creatorcontrib>Prabhakaran, Vinodkumar</creatorcontrib><title>GeniL: A Multilingual Dataset on Generalizing Language</title><description>Generative language models are transforming our digital ecosystem, but they often inherit societal biases, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effectively detect instances of stereotype perpetuation is a crucial first step. Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures, without accounting for the variety of sentential contexts they manifest in. We argue that understanding the sentential context is crucial for detecting instances of generalization. We distinguish two types of generalizations: (1) language that merely mentions the presence of a generalization ("people think the French are very rude"), and (2) language that reinforces such a generalization ("as French they must be rude"), from non-generalizing context ("My French friends think I am rude"). For meaningful stereotype evaluations, we need to reliably distinguish such instances of generalizations. We introduce the new task of detecting generalization in language, and build GeniL, a multilingual dataset of over 50K sentences from 9 languages (English, Arabic, Bengali, Spanish, French, Hindi, Indonesian, Malay, and Portuguese) annotated for instances of generalizations. We demonstrate that the likelihood of a co-occurrence being an instance of generalization is usually low, and varies across different languages, identity groups, and attributes. We build classifiers to detect generalization in language with an overall PR-AUC of 58.7, with varying degrees of performance across languages. Our research provides data and tools to enable a nuanced understanding of stereotype perpetuation, a crucial step towards more inclusive and responsible language technologies.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7FuwjAURb0wVNAP6IR_IOHZOH4xG4KWIgWxsEfPsY0smYBCqApfX6BMdzhXV_cw9iEgV2VRwIS63_iTSwUqh6LU-o3plW9jNeNzvrmkPqbY7i-U-JJ6OvueH1t-L_iOUrzdEa_owfd-xAaB0tm_v3LIdl-fu8V3Vm1X68W8ykijzppSQhCgXHCa0GoDEr2QxhmJ0ir0TWONsGgxgFJYyoYwSIBAwmlncDpk4__Z5_H61MUDddf6IVA_BaZ_gQQ_bw</recordid><startdate>20240408</startdate><enddate>20240408</enddate><creator>Davani, Aida Mostafazadeh</creator><creator>Gubbi, Sagar</creator><creator>Dev, Sunipa</creator><creator>Dave, Shachi</creator><creator>Prabhakaran, Vinodkumar</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240408</creationdate><title>GeniL: A Multilingual Dataset on Generalizing Language</title><author>Davani, Aida Mostafazadeh ; Gubbi, Sagar ; Dev, Sunipa ; Dave, Shachi ; Prabhakaran, Vinodkumar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-c820f104dfd6a7b69027e129d9272b47eccb91b7b7f044782ca7f200fa1d6d973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Davani, Aida Mostafazadeh</creatorcontrib><creatorcontrib>Gubbi, Sagar</creatorcontrib><creatorcontrib>Dev, Sunipa</creatorcontrib><creatorcontrib>Dave, Shachi</creatorcontrib><creatorcontrib>Prabhakaran, Vinodkumar</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Davani, Aida Mostafazadeh</au><au>Gubbi, Sagar</au><au>Dev, Sunipa</au><au>Dave, Shachi</au><au>Prabhakaran, Vinodkumar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GeniL: A Multilingual Dataset on Generalizing Language</atitle><date>2024-04-08</date><risdate>2024</risdate><abstract>Generative language models are transforming our digital ecosystem, but they often inherit societal biases, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effectively detect instances of stereotype perpetuation is a crucial first step. Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures, without accounting for the variety of sentential contexts they manifest in. We argue that understanding the sentential context is crucial for detecting instances of generalization. We distinguish two types of generalizations: (1) language that merely mentions the presence of a generalization ("people think the French are very rude"), and (2) language that reinforces such a generalization ("as French they must be rude"), from non-generalizing context ("My French friends think I am rude"). For meaningful stereotype evaluations, we need to reliably distinguish such instances of generalizations. We introduce the new task of detecting generalization in language, and build GeniL, a multilingual dataset of over 50K sentences from 9 languages (English, Arabic, Bengali, Spanish, French, Hindi, Indonesian, Malay, and Portuguese) annotated for instances of generalizations. We demonstrate that the likelihood of a co-occurrence being an instance of generalization is usually low, and varies across different languages, identity groups, and attributes. We build classifiers to detect generalization in language with an overall PR-AUC of 58.7, with varying degrees of performance across languages. Our research provides data and tools to enable a nuanced understanding of stereotype perpetuation, a crucial step towards more inclusive and responsible language technologies.</abstract><doi>10.48550/arxiv.2404.05866</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2404.05866
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2404_05866
source	arXiv.org
subjects	Computer Science - Computation and Language
title	GeniL: A Multilingual Dataset on Generalizing Language
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T19%3A38%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GeniL:%20A%20Multilingual%20Dataset%20on%20Generalizing%20Language&rft.au=Davani,%20Aida%20Mostafazadeh&rft.date=2024-04-08&rft_id=info:doi/10.48550/arxiv.2404.05866&rft_dat=%3Carxiv_GOX%3E2404_05866%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true