3D terrain estimation from a single landscape image

This article presents the first technique to estimate a 3D terrain model from a single landscape image. Although monocular depth estimation also offers single‐image 3D reconstruction, it assigns depth only to pixels visible in the input image, resulting in an incomplete 3D terrain output. Our method...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer animation and virtual worlds 2022-11, Vol.33 (6), p.n/a
Hauptverfasser: Takahashi, Haruka, Kanamori, Yoshihiro, Endo, Yuki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This article presents the first technique to estimate a 3D terrain model from a single landscape image. Although monocular depth estimation also offers single‐image 3D reconstruction, it assigns depth only to pixels visible in the input image, resulting in an incomplete 3D terrain output. Our method generates a complete 3D terrain model as a textured height map via a three‐stage framework using deep neural networks. First, to exploit the performance of pixel‐aligned estimation, we estimate terrain's per‐pixel depth and color free from shadows or lights in the perspective view. Second, we triangulate the RGB‐D data generated in the first stage and rasterize the triangular mesh from the top view to obtain an incomplete textured height map. Finally, we inpaint the depth and color in the missing regions. Because there are many possible ways to complete the missing regions, we synthesize diverse shapes and textures during inpainting using a variational autoencoder. Qualitative and quantitative experiments reveal that our method outperforms existing methods applying a direct perspective‐to‐top view transform as image‐to‐image translation. We have developed the first method for reconstructing a 3D terrain model as a textured height map from a single landscape image. Our method consists of two steps. The first step is monocular depth/texture inference from the input view whereas the second step is height/texture completion from the top view.
ISSN:1546-4261
1546-427X
DOI:10.1002/cav.2119