Multi-attribute balanced sampling for disentangled GAN controls

•We propose a method to identify interpretable directions in the latent space of pre-trained GANs.•We show that the directions usually reflect the biases existing in the GAN training set, leading to entangled edits.•We propose to balance the semantics of the dataset used to identify the directions t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition letters 2022-10, Vol.162, p.56-62
Hauptverfasser:	Doubinsky, Perla, Audebert, Nicolas, Crucianu, Michel, Le Borgne, Hervé
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science GANs Image editing Latent space
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We propose a method to identify interpretable directions in the latent space of pre-trained GANs.•We show that the directions usually reflect the biases existing in the GAN training set, leading to entangled edits.•We propose to balance the semantics of the dataset used to identify the directions to avoid the propagation of bias.•We apply our method on different GAN architectures trained for face synthesis to control facial attributes.•Experiments show that our directions are on par or more disentangled than the state-of-the-art without ad hoc postprocessing. Various controls over the generated data can be extracted from the latent space of a pre-trained GAN, as it implicitly encodes the semantics of the training data. The discovered controls allow to vary semantic attributes in the generated images but usually lead to entangled edits that affect multiple attributes at the same time. Supervised approaches typically sample and annotate a collection of latent codes, then train classifiers in the latent space to identify the controls. Since the data generated by GANs reflects the biases of the original dataset, so do the resulting semantic controls. We propose to address disentanglement by balancing the semantics of the dataset before training the classifiers. We demonstrate the effectiveness of this approach by extracting disentangled linear directions for face manipulation on state-of-the-art GAN architectures (including StyleGAN2 and StyleGAN3) and two datasets, CelebAHQ and FFHQ. We show that this simple and general approach outperforms state-of-the-art classifier-based methods while avoiding the need for disentanglement-enforcing post-processing.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2022.08.012