Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real Identities
Synthetic data generation is gaining increasing popularity in different computer vision applications. Existing state-of-the-art face recognition models are trained using large-scale face datasets, which are crawled from the Internet and raise privacy and ethical concerns. To address such concerns, s...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Synthetic data generation is gaining increasing popularity in different
computer vision applications. Existing state-of-the-art face recognition models
are trained using large-scale face datasets, which are crawled from the
Internet and raise privacy and ethical concerns. To address such concerns,
several works have proposed generating synthetic face datasets to train face
recognition models. However, these methods depend on generative models, which
are trained on real face images. In this work, we design a simple yet effective
membership inference attack to systematically study if any of the existing
synthetic face recognition datasets leak any information from the real data
used to train the generator model. We provide an extensive study on 6
state-of-the-art synthetic face recognition datasets, and show that in all
these synthetic datasets, several samples from the original real dataset are
leaked. To our knowledge, this paper is the first work which shows the leakage
from training data of generator models into the generated synthetic face
recognition datasets. Our study demonstrates privacy pitfalls in synthetic face
recognition datasets and paves the way for future studies on generating
responsible synthetic face datasets. |
---|---|
DOI: | 10.48550/arxiv.2410.24015 |