BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers
Attacking fairness is crucial because compromised models can introduce biased outcomes, undermining trust and amplifying inequalities in sensitive applications like hiring, healthcare, and law enforcement. This highlights the urgent need to understand how fairness mechanisms can be exploited and to...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Attacking fairness is crucial because compromised models can introduce biased
outcomes, undermining trust and amplifying inequalities in sensitive
applications like hiring, healthcare, and law enforcement. This highlights the
urgent need to understand how fairness mechanisms can be exploited and to
develop defenses that ensure both fairness and robustness. We introduce
BadFair, a novel backdoored fairness attack methodology. BadFair stealthily
crafts a model that operates with accuracy and fairness under regular
conditions but, when activated by certain triggers, discriminates and produces
incorrect results for specific groups. This type of attack is particularly
stealthy and dangerous, as it circumvents existing fairness detection methods,
maintaining an appearance of fairness in normal use. Our findings reveal that
BadFair achieves a more than 85% attack success rate in attacks aimed at target
groups on average while only incurring a minimal accuracy loss. Moreover, it
consistently exhibits a significant discrimination score, distinguishing
between pre-defined target and non-target attacked groups across various
datasets and models. |
---|---|
DOI: | 10.48550/arxiv.2410.17492 |