Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models
Large language models (LLMs) have been shown to propagate and amplify harmful stereotypes, particularly those that disproportionately affect marginalised communities. To understand the effect of these stereotypes more comprehensively, we introduce GlobalBias, a dataset of 876k sentences incorporatin...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large language models (LLMs) have been shown to propagate and amplify harmful
stereotypes, particularly those that disproportionately affect marginalised
communities. To understand the effect of these stereotypes more
comprehensively, we introduce GlobalBias, a dataset of 876k sentences
incorporating 40 distinct gender-by-ethnicity groups alongside descriptors
typically used in bias literature, which enables us to study a broad set of
stereotypes from around the world. We use GlobalBias to directly probe a suite
of LMs via perplexity, which we use as a proxy to determine how certain
stereotypes are represented in the model's internal representations. Following
this, we generate character profiles based on given names and evaluate the
prevalence of stereotypes in model outputs. We find that the demographic groups
associated with various stereotypes remain consistent across model likelihoods
and model outputs. Furthermore, larger models consistently display higher
levels of stereotypical outputs, even when explicitly instructed not to. |
---|---|
DOI: | 10.48550/arxiv.2407.06917 |