An End-to-End Depth-Based Pipeline for Selfie Image Rectification
Portraits or selfie images taken from a close distance typically suffer from perspective distortion. In this paper, we propose an end-to-end deep learning-based rectification pipeline to mitigate the effects of perspective distortion. We learn to predict the facial depth by training a deep CNN. The...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Portraits or selfie images taken from a close distance typically suffer from
perspective distortion. In this paper, we propose an end-to-end deep
learning-based rectification pipeline to mitigate the effects of perspective
distortion. We learn to predict the facial depth by training a deep CNN. The
estimated depth is utilized to adjust the camera-to-subject distance by moving
the camera farther, increasing the camera focal length, and reprojecting the 3D
image features to the new perspective. The reprojected features are then fed to
an inpainting module to fill in the missing pixels. We leverage a
differentiable renderer to enable end-to-end training of our depth estimation
and feature extraction nets to improve the rectified outputs. To boost the
results of the inpainting module, we incorporate an auxiliary module to predict
the horizontal movement of the camera which decreases the area that requires
hallucination of challenging face parts such as ears. Unlike previous works, we
process the full-frame input image at once without cropping the subject's face
and processing it separately from the rest of the body, eliminating the need
for complex post-processing steps to attach the face back to the subject's
body. To train our network, we utilize the popular game engine Unreal Engine to
generate a large synthetic face dataset containing various subjects, head
poses, expressions, eyewear, clothes, and lighting. Quantitative and
qualitative results show that our rectification pipeline outperforms previous
methods, and produces comparable results with a time-consuming 3D GAN-based
method while being more than 260 times faster. |
---|---|
DOI: | 10.48550/arxiv.2412.19189 |