Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks
In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work, we focus on outdoor lighting estimation by aggregating
individual noisy estimates from images, exploiting the rich image information
from wide-angle cameras and/or temporal image sequences. Photographs inherently
encode information about the scene's lighting in the form of shading and
shadows. Recovering the lighting is an inverse rendering problem and as that
ill-posed. Recent work based on deep neural networks has shown promising
results for single image lighting estimation, but suffers from robustness. We
tackle this problem by combining lighting estimates from several image views
sampled in the angular and temporal domain of an image sequence. For this task,
we introduce a transformer architecture that is trained in an end-2-end fashion
without any statistical post-processing as required by previous work. Thereby,
we propose a positional encoding that takes into account the camera calibration
and ego-motion estimation to globally register the individual estimates when
computing attention between visual words. We show that our method leads to
improved lighting estimation while requiring less hyper-parameters compared to
the state-of-the-art. |
---|---|
DOI: | 10.48550/arxiv.2202.09206 |