Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons
In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controll...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we describe how to plant novel types of backdoors in any
facial recognition model based on the popular architecture of deep Siamese
neural networks. These backdoors force the system to err only on natural images
of specific persons who are preselected by the attacker, without controlling
their appearance or inserting any triggers. For example, we show how such a
backdoored system can classify any two images of a particular person as
different people, or any two images of a particular pair of persons as the same
person, with almost no effect on the correctness of its decisions for other
persons. Surprisingly, we show that both types of backdoors can be implemented
by applying linear transformations to the model's last weight matrix, with no
additional training or optimization, using only images of the backdoor
identities. A unique property of our attack is that multiple backdoors can be
independently installed in the same model by multiple attackers, who may not be
aware of each other's existence, with almost no interference. We have
experimentally verified the attacks on a SOTA facial recognition system. When
we tried to individually anonymize ten celebrities, the network failed to
recognize two of their images as being the same person in $97.02\%$ to
$98.31\%$ of the time. When we tried to confuse between the extremely
different-looking Morgan Freeman and Scarlett Johansson, for example, their
images were declared to be the same person in $98.47 \%$ of the time. For each
type of backdoor, we sequentially installed multiple backdoors with minimal
effect on the performance of each other (for example, anonymizing all ten
celebrities on the same model reduced the success rate for each celebrity by no
more than $1.01\%$). In all of our experiments, the benign accuracy of the
network on other persons barely degraded (in most cases, it degraded by less
than $0.05\%$). |
---|---|
DOI: | 10.48550/arxiv.2301.03118 |