Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks

In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lee, Haebom, Homeyer, Christian, Herzog, Robert, Rexilius, Jan, Rother, Carsten
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lee, Haebom
Homeyer, Christian
Herzog, Robert
Rexilius, Jan
Rother, Carsten
description In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.
doi_str_mv 10.48550/arxiv.2202.09206
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2202_09206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2202_09206</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-92444befc7dbf156b46a87c69363a9026e91483dfd22d44f324efa462e26967b3</originalsourceid><addsrcrecordid>eNotj81OhDAURrtxYUYfwJV9AbC05UKXk4k_kxBnMazckAvcVuJAsYA_b6-MJl_ybU5Ochi7SUSs8zQVdxi-uo9YSiFjYaSAS_ZyHHHufFRSP_qAJ35Y5tb7wIvOvc7d4PjWuUBuhQb-u32PjviR3hcaGpr4Mq1QGXCYrA89Bf5M86cPb9MVu7B4muj6_zesfLgvd09RcXjc77ZFhJBBZKTWuibbZG1tkxRqDZhnDRgFCo2QQCbRuWptK2WrtVVSk0UNkiQYyGq1Ybd_2nNcNYaux_BdrZHVOVL9ABu5TZY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><source>arXiv.org</source><creator>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</creator><creatorcontrib>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</creatorcontrib><description>In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.</description><identifier>DOI: 10.48550/arxiv.2202.09206</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2202.09206$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2202.09206$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Haebom</creatorcontrib><creatorcontrib>Homeyer, Christian</creatorcontrib><creatorcontrib>Herzog, Robert</creatorcontrib><creatorcontrib>Rexilius, Jan</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><description>In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OhDAURrtxYUYfwJV9AbC05UKXk4k_kxBnMazckAvcVuJAsYA_b6-MJl_ybU5Ochi7SUSs8zQVdxi-uo9YSiFjYaSAS_ZyHHHufFRSP_qAJ35Y5tb7wIvOvc7d4PjWuUBuhQb-u32PjviR3hcaGpr4Mq1QGXCYrA89Bf5M86cPb9MVu7B4muj6_zesfLgvd09RcXjc77ZFhJBBZKTWuibbZG1tkxRqDZhnDRgFCo2QQCbRuWptK2WrtVVSk0UNkiQYyGq1Ybd_2nNcNYaux_BdrZHVOVL9ABu5TZY</recordid><startdate>20220218</startdate><enddate>20220218</enddate><creator>Lee, Haebom</creator><creator>Homeyer, Christian</creator><creator>Herzog, Robert</creator><creator>Rexilius, Jan</creator><creator>Rother, Carsten</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220218</creationdate><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><author>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-92444befc7dbf156b46a87c69363a9026e91483dfd22d44f324efa462e26967b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Haebom</creatorcontrib><creatorcontrib>Homeyer, Christian</creatorcontrib><creatorcontrib>Herzog, Robert</creatorcontrib><creatorcontrib>Rexilius, Jan</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Haebom</au><au>Homeyer, Christian</au><au>Herzog, Robert</au><au>Rexilius, Jan</au><au>Rother, Carsten</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</atitle><date>2022-02-18</date><risdate>2022</risdate><abstract>In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading and shadows. Recovering the lighting is an inverse rendering problem and as that ill-posed. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence. For this task, we introduce a transformer architecture that is trained in an end-2-end fashion without any statistical post-processing as required by previous work. Thereby, we propose a positional encoding that takes into account the camera calibration and ego-motion estimation to globally register the individual estimates when computing attention between visual words. We show that our method leads to improved lighting estimation while requiring less hyper-parameters compared to the state-of-the-art.</abstract><doi>10.48550/arxiv.2202.09206</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2202.09206
ispartof
issn
language eng
recordid cdi_arxiv_primary_2202_09206
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T20%3A15%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spatio-Temporal%20Outdoor%20Lighting%20Aggregation%20on%20Image%20Sequences%20using%20Transformer%20Networks&rft.au=Lee,%20Haebom&rft.date=2022-02-18&rft_id=info:doi/10.48550/arxiv.2202.09206&rft_dat=%3Carxiv_GOX%3E2202_09206%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true