Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks
Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for predictions is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. In this...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Ensuring a neural network is not relying on protected attributes (e.g., race,
sex, age) for predictions is crucial in advancing fair and trustworthy AI.
While several promising methods for removing attribute bias in neural networks
have been proposed, their limitations remain under-explored. In this work, we
mathematically and empirically reveal an important limitation of attribute bias
removal methods in presence of strong bias. Specifically, we derive a general
non-vacuous information-theoretical upper bound on the performance of any
attribute bias removal method in terms of the bias strength. We provide
extensive experiments on synthetic, image, and census datasets to verify the
theoretical bound and its consequences in practice. Our findings show that
existing attribute bias removal methods are effective only when the inherent
bias in the dataset is relatively weak, thus cautioning against the use of
these methods in smaller datasets where strong attribute bias can occur, and
advocating the need for methods that can overcome this limitation. |
---|---|
DOI: | 10.48550/arxiv.2310.04955 |