Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks
In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images, exploiting the rich image information from wide-angle cameras and/or temporal image sequences. Photographs inherently encode information about the scene's lighting in the form of shading...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Lee, Haebom Homeyer, Christian Herzog, Robert Rexilius, Jan Rother, Carsten |
description | In this work, we focus on outdoor lighting estimation by aggregating
individual noisy estimates from images, exploiting the rich image information
from wide-angle cameras and/or temporal image sequences. Photographs inherently
encode information about the scene's lighting in the form of shading and
shadows. Recovering the lighting is an inverse rendering problem and as that
ill-posed. Recent work based on deep neural networks has shown promising
results for single image lighting estimation, but suffers from robustness. We
tackle this problem by combining lighting estimates from several image views
sampled in the angular and temporal domain of an image sequence. For this task,
we introduce a transformer architecture that is trained in an end-2-end fashion
without any statistical post-processing as required by previous work. Thereby,
we propose a positional encoding that takes into account the camera calibration
and ego-motion estimation to globally register the individual estimates when
computing attention between visual words. We show that our method leads to
improved lighting estimation while requiring less hyper-parameters compared to
the state-of-the-art. |
doi_str_mv | 10.48550/arxiv.2202.09206 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2202_09206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2202_09206</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-92444befc7dbf156b46a87c69363a9026e91483dfd22d44f324efa462e26967b3</originalsourceid><addsrcrecordid>eNotj81OhDAURrtxYUYfwJV9AbC05UKXk4k_kxBnMazckAvcVuJAsYA_b6-MJl_ybU5Ochi7SUSs8zQVdxi-uo9YSiFjYaSAS_ZyHHHufFRSP_qAJ35Y5tb7wIvOvc7d4PjWuUBuhQb-u32PjviR3hcaGpr4Mq1QGXCYrA89Bf5M86cPb9MVu7B4muj6_zesfLgvd09RcXjc77ZFhJBBZKTWuibbZG1tkxRqDZhnDRgFCo2QQCbRuWptK2WrtVVSk0UNkiQYyGq1Ybd_2nNcNYaux_BdrZHVOVL9ABu5TZY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><source>arXiv.org</source><creator>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</creator><creatorcontrib>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</creatorcontrib><description>In this work, we focus on outdoor lighting estimation by aggregating
individual noisy estimates from images, exploiting the rich image information
from wide-angle cameras and/or temporal image sequences. Photographs inherently
encode information about the scene's lighting in the form of shading and
shadows. Recovering the lighting is an inverse rendering problem and as that
ill-posed. Recent work based on deep neural networks has shown promising
results for single image lighting estimation, but suffers from robustness. We
tackle this problem by combining lighting estimates from several image views
sampled in the angular and temporal domain of an image sequence. For this task,
we introduce a transformer architecture that is trained in an end-2-end fashion
without any statistical post-processing as required by previous work. Thereby,
we propose a positional encoding that takes into account the camera calibration
and ego-motion estimation to globally register the individual estimates when
computing attention between visual words. We show that our method leads to
improved lighting estimation while requiring less hyper-parameters compared to
the state-of-the-art.</description><identifier>DOI: 10.48550/arxiv.2202.09206</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2202.09206$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2202.09206$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Haebom</creatorcontrib><creatorcontrib>Homeyer, Christian</creatorcontrib><creatorcontrib>Herzog, Robert</creatorcontrib><creatorcontrib>Rexilius, Jan</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><description>In this work, we focus on outdoor lighting estimation by aggregating
individual noisy estimates from images, exploiting the rich image information
from wide-angle cameras and/or temporal image sequences. Photographs inherently
encode information about the scene's lighting in the form of shading and
shadows. Recovering the lighting is an inverse rendering problem and as that
ill-posed. Recent work based on deep neural networks has shown promising
results for single image lighting estimation, but suffers from robustness. We
tackle this problem by combining lighting estimates from several image views
sampled in the angular and temporal domain of an image sequence. For this task,
we introduce a transformer architecture that is trained in an end-2-end fashion
without any statistical post-processing as required by previous work. Thereby,
we propose a positional encoding that takes into account the camera calibration
and ego-motion estimation to globally register the individual estimates when
computing attention between visual words. We show that our method leads to
improved lighting estimation while requiring less hyper-parameters compared to
the state-of-the-art.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OhDAURrtxYUYfwJV9AbC05UKXk4k_kxBnMazckAvcVuJAsYA_b6-MJl_ybU5Ochi7SUSs8zQVdxi-uo9YSiFjYaSAS_ZyHHHufFRSP_qAJ35Y5tb7wIvOvc7d4PjWuUBuhQb-u32PjviR3hcaGpr4Mq1QGXCYrA89Bf5M86cPb9MVu7B4muj6_zesfLgvd09RcXjc77ZFhJBBZKTWuibbZG1tkxRqDZhnDRgFCo2QQCbRuWptK2WrtVVSk0UNkiQYyGq1Ybd_2nNcNYaux_BdrZHVOVL9ABu5TZY</recordid><startdate>20220218</startdate><enddate>20220218</enddate><creator>Lee, Haebom</creator><creator>Homeyer, Christian</creator><creator>Herzog, Robert</creator><creator>Rexilius, Jan</creator><creator>Rother, Carsten</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220218</creationdate><title>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</title><author>Lee, Haebom ; Homeyer, Christian ; Herzog, Robert ; Rexilius, Jan ; Rother, Carsten</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-92444befc7dbf156b46a87c69363a9026e91483dfd22d44f324efa462e26967b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Haebom</creatorcontrib><creatorcontrib>Homeyer, Christian</creatorcontrib><creatorcontrib>Herzog, Robert</creatorcontrib><creatorcontrib>Rexilius, Jan</creatorcontrib><creatorcontrib>Rother, Carsten</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Haebom</au><au>Homeyer, Christian</au><au>Herzog, Robert</au><au>Rexilius, Jan</au><au>Rother, Carsten</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks</atitle><date>2022-02-18</date><risdate>2022</risdate><abstract>In this work, we focus on outdoor lighting estimation by aggregating
individual noisy estimates from images, exploiting the rich image information
from wide-angle cameras and/or temporal image sequences. Photographs inherently
encode information about the scene's lighting in the form of shading and
shadows. Recovering the lighting is an inverse rendering problem and as that
ill-posed. Recent work based on deep neural networks has shown promising
results for single image lighting estimation, but suffers from robustness. We
tackle this problem by combining lighting estimates from several image views
sampled in the angular and temporal domain of an image sequence. For this task,
we introduce a transformer architecture that is trained in an end-2-end fashion
without any statistical post-processing as required by previous work. Thereby,
we propose a positional encoding that takes into account the camera calibration
and ego-motion estimation to globally register the individual estimates when
computing attention between visual words. We show that our method leads to
improved lighting estimation while requiring less hyper-parameters compared to
the state-of-the-art.</abstract><doi>10.48550/arxiv.2202.09206</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2202.09206 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2202_09206 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T20%3A15%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Spatio-Temporal%20Outdoor%20Lighting%20Aggregation%20on%20Image%20Sequences%20using%20Transformer%20Networks&rft.au=Lee,%20Haebom&rft.date=2022-02-18&rft_id=info:doi/10.48550/arxiv.2202.09206&rft_dat=%3Carxiv_GOX%3E2202_09206%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |