Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Image generation today can produce somewhat realistic images from text prompts. However, if one asks the generator to synthesize a particular camera setting such as creating different fields of view using a 24mm lens versus a 70mm lens, the generator will not be able to interpret and generate scene-...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Image generation today can produce somewhat realistic images from text
prompts. However, if one asks the generator to synthesize a particular camera
setting such as creating different fields of view using a 24mm lens versus a
70mm lens, the generator will not be able to interpret and generate
scene-consistent images. This limitation not only hinders the adoption of
generative tools in photography applications but also exemplifies a broader
issue of bridging the gap between the data-driven models and the physical
world. In this paper, we introduce the concept of Generative Photography, a
framework designed to control camera intrinsic settings during content
generation. The core innovation of this work are the concepts of Dimensionality
Lifting and Contrastive Camera Learning, which achieve continuous and
consistent transitions for different camera settings. Experimental results show
that our method produces significantly more scene-consistent photorealistic
images than state-of-the-art models such as Stable Diffusion 3 and FLUX. |
---|---|
DOI: | 10.48550/arxiv.2412.02168 |