Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences Using Transformer Networks

In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the lighting of the scene in the form of shading...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2023-04, Vol.131 (4), p.1060-1072
Hauptverfasser:	Lee, Haebom, Homeyer, Christian, Herzog, Robert, Rexilius, Jan, Rother, Carsten
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Artificial neural networks Cameras Computer Imaging Computer Science Estimates Estimation Image Processing and Computer Vision Image sequencing Lighting Motion simulation Neural networks Pattern Recognition Pattern Recognition and Graphics Special Issue on Pattern Recognition (DAGM GCPR 2021) Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the lighting of the scene in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent research based on deep neural networks has shown promising results for estimating light from a single image, but with shortcomings in robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domains of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account camera alignment and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring fewer hyperparameters compared to the state of the art.
ISSN:	0920-5691 1573-1405
DOI:	10.1007/s11263-022-01725-2