Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models
Professional-grade software applications are powerful but complicated$-$expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user is confronted with an array of cryptic sliders like "cl...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Professional-grade software applications are powerful but
complicated$-$expert users can achieve impressive results, but novices often
struggle to complete even basic tasks. Photo editing is a prime example: after
loading a photo, the user is confronted with an array of cryptic sliders like
"clarity", "temp", and "highlights". An automatically generated suggestion
could help, but there is no single "correct" edit for a given image$-$different
experts may make very different aesthetic decisions when faced with the same
image, and a single expert may make different choices depending on the intended
use of the image (or on a whim). We therefore want a system that can propose
multiple diverse, high-quality edits while also learning from and adapting to a
user's aesthetic preferences. In this work, we develop a statistical model that
meets these objectives. Our model builds on recent advances in neural network
generative modeling and scalable inference, and uses hierarchical structure to
learn editing patterns across many diverse users. Empirically, we find that our
model outperforms other approaches on this challenging multimodal prediction
task. |
---|---|
DOI: | 10.48550/arxiv.1704.04997 |