Language-Agnostic Bias Detection in Language Models with Bias Probing
Pretrained language models (PLMs) are key components in NLP, but they contain strong social biases. Quantifying these biases is challenging because current methods focusing on fill-the-mask objectives are sensitive to slight changes in input. To address this, we propose a bias probing technique call...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Pretrained language models (PLMs) are key components in NLP, but they contain
strong social biases. Quantifying these biases is challenging because current
methods focusing on fill-the-mask objectives are sensitive to slight changes in
input. To address this, we propose a bias probing technique called LABDet, for
evaluating social bias in PLMs with a robust and language-agnostic method. For
nationality as a case study, we show that LABDet `surfaces' nationality bias by
training a classifier on top of a frozen PLM on non-nationality sentiment
detection. We find consistent patterns of nationality bias across monolingual
PLMs in six languages that align with historical and political context. We also
show for English BERT that bias surfaced by LABDet correlates well with bias in
the pretraining data; thus, our work is one of the few studies that directly
links pretraining data to PLM behavior. Finally, we verify LABDet's reliability
and applicability to different templates and languages through an extensive set
of robustness checks. We publicly share our code and dataset in
https://github.com/akoksal/LABDet. |
---|---|
DOI: | 10.48550/arxiv.2305.13302 |