Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks

In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lee, Haebom, Homeyer, Christian, Herzog, Robert, Rexilius, Jan, Rother, Carsten
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lee, Haebom Homeyer, Christian Herzog, Robert Rexilius, Jan Rother, Carsten
description	In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.
doi_str_mv	10.48550/arxiv.2202.09206
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2202_09206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2202_09206</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-92444befc7dbf156b46a87c69363a9026e91483dfd22d44f324efa462e26967b3</originalsourceid><addsrcrecordid>eNotj81OhDAURrtxYUYfwJV9AbC05UKXk4k_kxBnMazckAvcVuJAsYA_b6-MJl_ybU5Ochi7SUSs8zQVdxi-uo9YSiFjYaSAS_ZyHHHufFRSP_qAJ35Y5tb7wIvOvc7d4PjWuUBuhQb-u32PjviR3hcaGpr4Mq1QGXCYrA89Bf5M86cPb9MVu7B4muj6_zesfLgvd09RcXjc77ZFhJBBZKTWuibbZG1tkxRqDZhnDRgFCo2QQCbRuWptK2WrtVVSk0UNkiQYyGq1Ybd_2nNcNYaux_BdrZHVOVL9ABu5TZY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><source>arXiv.org</source><creator>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</creator><creatorcontrib>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</creatorcontrib><description>In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.</description><identifier>DOI: 10.48550/arxiv.2202.09206</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2202.09206$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2202.09206$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Haebom</creatorcontrib><creatorcontrib>Homeyer, Christian</creatorcontrib><creatorcontrib>Herzog, Robert</creatorcontrib><creatorcontrib>Rexilius, Jan</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><description>In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OhDAURrtxYUYfwJV9AbC05UKXk4k_kxBnMazckAvcVuJAsYA_b6-MJl_ybU5Ochi7SUSs8zQVdxi-uo9YSiFjYaSAS_ZyHHHufFRSP_qAJ35Y5tb7wIvOvc7d4PjWuUBuhQb-u32PjviR3hcaGpr4Mq1QGXCYrA89Bf5M86cPb9MVu7B4muj6_zesfLgvd09RcXjc77ZFhJBBZKTWuibbZG1tkxRqDZhnDRgFCo2QQCbRuWptK2WrtVVSk0UNkiQYyGq1Ybd_2nNcNYaux_BdrZHVOVL9ABu5TZY</recordid><startdate>20220218</startdate><enddate>20220218</enddate><creator>Lee, Haebom</creator><creator>Homeyer, Christian</creator><creator>Herzog, Robert</creator><creator>Rexilius, Jan</creator><creator>Rother, Carsten</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220218</creationdate><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><author>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-92444befc7dbf156b46a87c69363a9026e91483dfd22d44f324efa462e26967b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Haebom</creatorcontrib><creatorcontrib>Homeyer, Christian</creatorcontrib><creatorcontrib>Herzog, Robert</creatorcontrib><creatorcontrib>Rexilius, Jan</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Haebom</au><au>Homeyer, Christian</au><au>Herzog, Robert</au><au>Rexilius, Jan</au><au>Rother, Carsten</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</atitle><date>2022-02-18</date><risdate>2022</risdate><abstract>In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.</abstract><doi>10.48550/arxiv.2202.09206</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2202.09206
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2202_09206
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T20%3A15%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spatio-Temporal%20Outdoor%20Lighting%20Aggregation%20on%20Image%20Sequences%20using%20Transformer%20Networks&rft.au=Lee,%20Haebom&rft.date=2022-02-18&rft_id=info:doi/10.48550/arxiv.2202.09206&rft_dat=%3Carxiv_GOX%3E2202_09206%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true