Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
Novel view synthesis from a single input image is a challenging task, where the goal is to generate a new view of a scene from a desired camera pose that may be separated by a large motion. The highly uncertain nature of this synthesis task due to unobserved elements within the scene (i.e. occlusion...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Novel view synthesis from a single input image is a challenging task, where
the goal is to generate a new view of a scene from a desired camera pose that
may be separated by a large motion. The highly uncertain nature of this
synthesis task due to unobserved elements within the scene (i.e. occlusion) and
outside the field-of-view makes the use of generative models appealing to
capture the variety of possible outputs. In this paper, we propose a novel
generative model capable of producing a sequence of photorealistic images
consistent with a specified camera trajectory, and a single starting image. Our
approach is centred on an autoregressive conditional diffusion-based model
capable of interpolating visible scene elements, and extrapolating unobserved
regions in a view, in a geometrically consistent manner. Conditioning is
limited to an image capturing a single camera view and the (relative) pose of
the new camera view. To measure the consistency over a sequence of generated
views, we introduce a new metric, the thresholded symmetric epipolar distance
(TSED), to measure the number of consistent frame pairs in a sequence. While
previous methods have been shown to produce high quality images and consistent
semantics across pairs of views, we show empirically with our metric that they
are often inconsistent with the desired camera poses. In contrast, we
demonstrate that our method produces both photorealistic and view-consistent
imagery. |
---|---|
DOI: | 10.48550/arxiv.2304.10700 |