Learning to disentangle latent physical factors of deformable faces

We proposed a monocular image disentanglement framework based on a compositional model. Our model disentangles the input image into its constituent components of albedo, depth, deformation, pose, and illumination. Instead of relying on any handcrafted priors, we trained our deep neural network to un...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Visual computer 2023-08, Vol.39 (8), p.3481-3494
Hauptverfasser: Ha, Inwoo, Chang, Hyun Sung, Son, Minjung, Yoon, Sung-eui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We proposed a monocular image disentanglement framework based on a compositional model. Our model disentangles the input image into its constituent components of albedo, depth, deformation, pose, and illumination. Instead of relying on any handcrafted priors, we trained our deep neural network to understand the physical meaning of each element by mimicking real-world operations, allowing it to reconstruct images in a self-supervised manner. Our model, trained on multi-frame images of each subject, demonstrates a better understanding of the objects without requiring any supervision or strong model assumptions. We utilized a deformation-free canonical space to align multi-frame images in the same space. This approach enables the understanding of information from multi-frame images in the same space. Our experiments showed that our approach accurately disentangled the physical elements of deformable faces from images with wide variations found in the wild.
ISSN:0178-2789
1432-2315
DOI:10.1007/s00371-023-02948-1