Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution

Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2024-01, Vol.26, p.1-14
Hauptverfasser: Cong, Ruixuan, Sheng, Hao, Yang, Da, Cui, Zhenglong, Chen, Rongshan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 14
container_issue
container_start_page 1
container_title IEEE transactions on multimedia
container_volume 26
creator Cong, Ruixuan
Sheng, Hao
Yang, Da
Cui, Zhenglong
Chen, Rongshan
description Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.
doi_str_mv 10.1109/TMM.2023.3282465
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2916477054</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10143279</ieee_id><sourcerecordid>2916477054</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</originalsourceid><addsrcrecordid>eNpNkM9LwzAUx4MoOKd3Dx4Cnjvzo2nT45ibDjYEN_EYsvZly-iamrSg_70p8-DpPR7fH7wPQveUTCglxdN2vZ4wwviEM8nSTFygES1SmhCS55dxF4wkBaPkGt2EcCSEpoLkIxTm323tbGebPd60urO6xrqp8LTZ97X2eOa8hzreXRPwp-0O-BmgxXNjbGmh6fDW6yYY50_gA44Tr-z-0OGFhbrCy5PeA970LfjkHYKr-yHoFl0ZXQe4-5tj9LGYb2evyertZTmbrpKSpaJLzM5QttOUSVOBKCUIbbSklZacsrLMGAedpZJVlSRFIQwTxY5xCryU8TWa8jF6POe23n31EDp1dL1vYqViBc3SPCdiUJGzqvQuBA9Gtd6etP9RlKgBrYpo1YBW_aGNloezxQLAP3nsZHnBfwHW5HYW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2916477054</pqid></control><display><type>article</type><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><source>IEEE Electronic Library (IEL)</source><creator>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</creator><creatorcontrib>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</creatorcontrib><description>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3282465</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Aperture imaging ; Computational modeling ; Context ; Convolution ; Feature extraction ; Image resolution ; light field ; Light fields ; Modelling ; multi-scale angular modeling ; Pixels ; Sampling ; Scene analysis ; Spatial data ; Spatial resolution ; sub-sampling spatial modeling ; super-resolution ; Superresolution ; transformer ; Transformers</subject><ispartof>IEEE transactions on multimedia, 2024-01, Vol.26, p.1-14</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</cites><orcidid>0000-0002-2811-8962 ; 0000-0001-5782-894X ; 0000-0003-4796-6382 ; 0000-0001-6410-5248 ; 0009-0003-7296-0545</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10143279$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10143279$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cong, Ruixuan</creatorcontrib><creatorcontrib>Sheng, Hao</creatorcontrib><creatorcontrib>Yang, Da</creatorcontrib><creatorcontrib>Cui, Zhenglong</creatorcontrib><creatorcontrib>Chen, Rongshan</creatorcontrib><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</description><subject>Aperture imaging</subject><subject>Computational modeling</subject><subject>Context</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>Image resolution</subject><subject>light field</subject><subject>Light fields</subject><subject>Modelling</subject><subject>multi-scale angular modeling</subject><subject>Pixels</subject><subject>Sampling</subject><subject>Scene analysis</subject><subject>Spatial data</subject><subject>Spatial resolution</subject><subject>sub-sampling spatial modeling</subject><subject>super-resolution</subject><subject>Superresolution</subject><subject>transformer</subject><subject>Transformers</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM9LwzAUx4MoOKd3Dx4Cnjvzo2nT45ibDjYEN_EYsvZly-iamrSg_70p8-DpPR7fH7wPQveUTCglxdN2vZ4wwviEM8nSTFygES1SmhCS55dxF4wkBaPkGt2EcCSEpoLkIxTm323tbGebPd60urO6xrqp8LTZ97X2eOa8hzreXRPwp-0O-BmgxXNjbGmh6fDW6yYY50_gA44Tr-z-0OGFhbrCy5PeA970LfjkHYKr-yHoFl0ZXQe4-5tj9LGYb2evyertZTmbrpKSpaJLzM5QttOUSVOBKCUIbbSklZacsrLMGAedpZJVlSRFIQwTxY5xCryU8TWa8jF6POe23n31EDp1dL1vYqViBc3SPCdiUJGzqvQuBA9Gtd6etP9RlKgBrYpo1YBW_aGNloezxQLAP3nsZHnBfwHW5HYW</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Cong, Ruixuan</creator><creator>Sheng, Hao</creator><creator>Yang, Da</creator><creator>Cui, Zhenglong</creator><creator>Chen, Rongshan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2811-8962</orcidid><orcidid>https://orcid.org/0000-0001-5782-894X</orcidid><orcidid>https://orcid.org/0000-0003-4796-6382</orcidid><orcidid>https://orcid.org/0000-0001-6410-5248</orcidid><orcidid>https://orcid.org/0009-0003-7296-0545</orcidid></search><sort><creationdate>20240101</creationdate><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><author>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Aperture imaging</topic><topic>Computational modeling</topic><topic>Context</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>Image resolution</topic><topic>light field</topic><topic>Light fields</topic><topic>Modelling</topic><topic>multi-scale angular modeling</topic><topic>Pixels</topic><topic>Sampling</topic><topic>Scene analysis</topic><topic>Spatial data</topic><topic>Spatial resolution</topic><topic>sub-sampling spatial modeling</topic><topic>super-resolution</topic><topic>Superresolution</topic><topic>transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cong, Ruixuan</creatorcontrib><creatorcontrib>Sheng, Hao</creatorcontrib><creatorcontrib>Yang, Da</creatorcontrib><creatorcontrib>Cui, Zhenglong</creatorcontrib><creatorcontrib>Chen, Rongshan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cong, Ruixuan</au><au>Sheng, Hao</au><au>Yang, Da</au><au>Cui, Zhenglong</au><au>Chen, Rongshan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>26</volume><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3282465</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-2811-8962</orcidid><orcidid>https://orcid.org/0000-0001-5782-894X</orcidid><orcidid>https://orcid.org/0000-0003-4796-6382</orcidid><orcidid>https://orcid.org/0000-0001-6410-5248</orcidid><orcidid>https://orcid.org/0009-0003-7296-0545</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2024-01, Vol.26, p.1-14
issn 1520-9210
1941-0077
language eng
recordid cdi_proquest_journals_2916477054
source IEEE Electronic Library (IEL)
subjects Aperture imaging
Computational modeling
Context
Convolution
Feature extraction
Image resolution
light field
Light fields
Modelling
multi-scale angular modeling
Pixels
Sampling
Scene analysis
Spatial data
Spatial resolution
sub-sampling spatial modeling
super-resolution
Superresolution
transformer
Transformers
title Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T11%3A10%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Spatial%20and%20Angular%20Correlations%20With%20Deep%20Efficient%20Transformers%20for%20Light%20Field%20Image%20Super-Resolution&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Cong,%20Ruixuan&rft.date=2024-01-01&rft.volume=26&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3282465&rft_dat=%3Cproquest_RIE%3E2916477054%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2916477054&rft_id=info:pmid/&rft_ieee_id=10143279&rfr_iscdi=true