LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
Mitigating biases in machine learning models has become an increasing concern in Natural Language Processing (NLP), particularly in developing fair text embeddings, which are crucial yet challenging for real-world applications like search engines. In response, this paper proposes a novel method for...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Mitigating biases in machine learning models has become an increasing concern
in Natural Language Processing (NLP), particularly in developing fair text
embeddings, which are crucial yet challenging for real-world applications like
search engines. In response, this paper proposes a novel method for learning
fair text embeddings. First, we define a novel content-conditional equal
distance (CCED) fairness for text embeddings, ensuring content-conditional
independence between sensitive attributes and text embeddings. Building on
CCED, we introduce a content-conditional debiasing (CCD) loss to ensure that
embeddings of texts with different sensitive attributes but identical content
maintain the same distance from the embedding of their corresponding neutral
text. Additionally, we tackle the issue of insufficient training data by using
Large Language Models (LLMs) with instructions to fairly augment texts into
different sensitive groups. Our extensive evaluations show that our approach
effectively enhances fairness while maintaining the utility of embeddings.
Furthermore, our augmented dataset, combined with the CCED metric, serves as an
new benchmark for evaluating fairness. |
---|---|
DOI: | 10.48550/arxiv.2402.14208 |