SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering
Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for prediction is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. To that...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Ensuring a neural network is not relying on protected attributes (e.g., race,
sex, age) for prediction is crucial in advancing fair and trustworthy AI. While
several promising methods for removing attribute bias in neural networks have
been proposed, their limitations remain under-explored. To that end, in this
work, we mathematically and empirically reveal the limitation of existing
attribute bias removal methods in presence of strong bias and propose a new
method that can mitigate this limitation. Specifically, we first derive a
general non-vacuous information-theoretical upper bound on the performance of
any attribute bias removal method in terms of the bias strength, revealing that
they are effective only when the inherent bias in the dataset is relatively
weak. Next, we derive a necessary condition for the existence of any method
that can remove attribute bias regardless of the bias strength. Inspired by
this condition, we then propose a new method using an adversarial objective
that directly filters out protected attributes in the input space while
maximally preserving all other attributes, without requiring any specific
target label. The proposed method achieves state-of-the-art performance in both
strong and moderate bias settings. We provide extensive experiments on
synthetic, image, and census datasets, to verify the derived theoretical bound
and its consequences in practice, and evaluate the effectiveness of the
proposed method in removing strong attribute bias. |
---|---|
DOI: | 10.48550/arxiv.2311.07141 |