Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution

Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and an...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2024-01, Vol.26, p.1-14
Hauptverfasser:	Cong, Ruixuan, Sheng, Hao, Yang, Da, Cui, Zhenglong, Chen, Rongshan
Format:	Artikel
Sprache:	eng
Schlagworte:	Aperture imaging Computational modeling Context Convolution Feature extraction Image resolution light field Light fields Modelling multi-scale angular modeling Pixels Sampling Scene analysis Spatial data Spatial resolution sub-sampling spatial modeling super-resolution Superresolution transformer Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	14
container_issue
container_start_page	1
container_title	IEEE transactions on multimedia
container_volume	26
creator	Cong, Ruixuan Sheng, Hao Yang, Da Cui, Zhenglong Chen, Rongshan
description	Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.
doi_str_mv	10.1109/TMM.2023.3282465
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2916477054</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10143279</ieee_id><sourcerecordid>2916477054</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</originalsourceid><addsrcrecordid>eNpNkM9LwzAUx4MoOKd3Dx4Cnjvzo2nT45ibDjYEN_EYsvZly-iamrSg_70p8-DpPR7fH7wPQveUTCglxdN2vZ4wwviEM8nSTFygES1SmhCS55dxF4wkBaPkGt2EcCSEpoLkIxTm323tbGebPd60urO6xrqp8LTZ97X2eOa8hzreXRPwp-0O-BmgxXNjbGmh6fDW6yYY50_gA44Tr-z-0OGFhbrCy5PeA970LfjkHYKr-yHoFl0ZXQe4-5tj9LGYb2evyertZTmbrpKSpaJLzM5QttOUSVOBKCUIbbSklZacsrLMGAedpZJVlSRFIQwTxY5xCryU8TWa8jF6POe23n31EDp1dL1vYqViBc3SPCdiUJGzqvQuBA9Gtd6etP9RlKgBrYpo1YBW_aGNloezxQLAP3nsZHnBfwHW5HYW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2916477054</pqid></control><display><type>article</type><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><source>IEEE Electronic Library (IEL)</source><creator>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</creator><creatorcontrib>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</creatorcontrib><description>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3282465</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Aperture imaging ; Computational modeling ; Context ; Convolution ; Feature extraction ; Image resolution ; light field ; Light fields ; Modelling ; multi-scale angular modeling ; Pixels ; Sampling ; Scene analysis ; Spatial data ; Spatial resolution ; sub-sampling spatial modeling ; super-resolution ; Superresolution ; transformer ; Transformers</subject><ispartof>IEEE transactions on multimedia, 2024-01, Vol.26, p.1-14</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</cites><orcidid>0000-0002-2811-8962 ; 0000-0001-5782-894X ; 0000-0003-4796-6382 ; 0000-0001-6410-5248 ; 0009-0003-7296-0545</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10143279$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10143279$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cong, Ruixuan</creatorcontrib><creatorcontrib>Sheng, Hao</creatorcontrib><creatorcontrib>Yang, Da</creatorcontrib><creatorcontrib>Cui, Zhenglong</creatorcontrib><creatorcontrib>Chen, Rongshan</creatorcontrib><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</description><subject>Aperture imaging</subject><subject>Computational modeling</subject><subject>Context</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>Image resolution</subject><subject>light field</subject><subject>Light fields</subject><subject>Modelling</subject><subject>multi-scale angular modeling</subject><subject>Pixels</subject><subject>Sampling</subject><subject>Scene analysis</subject><subject>Spatial data</subject><subject>Spatial resolution</subject><subject>sub-sampling spatial modeling</subject><subject>super-resolution</subject><subject>Superresolution</subject><subject>transformer</subject><subject>Transformers</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM9LwzAUx4MoOKd3Dx4Cnjvzo2nT45ibDjYEN_EYsvZly-iamrSg_70p8-DpPR7fH7wPQveUTCglxdN2vZ4wwviEM8nSTFygES1SmhCS55dxF4wkBaPkGt2EcCSEpoLkIxTm323tbGebPd60urO6xrqp8LTZ97X2eOa8hzreXRPwp-0O-BmgxXNjbGmh6fDW6yYY50_gA44Tr-z-0OGFhbrCy5PeA970LfjkHYKr-yHoFl0ZXQe4-5tj9LGYb2evyertZTmbrpKSpaJLzM5QttOUSVOBKCUIbbSklZacsrLMGAedpZJVlSRFIQwTxY5xCryU8TWa8jF6POe23n31EDp1dL1vYqViBc3SPCdiUJGzqvQuBA9Gtd6etP9RlKgBrYpo1YBW_aGNloezxQLAP3nsZHnBfwHW5HYW</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Cong, Ruixuan</creator><creator>Sheng, Hao</creator><creator>Yang, Da</creator><creator>Cui, Zhenglong</creator><creator>Chen, Rongshan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2811-8962</orcidid><orcidid>https://orcid.org/0000-0001-5782-894X</orcidid><orcidid>https://orcid.org/0000-0003-4796-6382</orcidid><orcidid>https://orcid.org/0000-0001-6410-5248</orcidid><orcidid>https://orcid.org/0009-0003-7296-0545</orcidid></search><sort><creationdate>20240101</creationdate><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><author>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Aperture imaging</topic><topic>Computational modeling</topic><topic>Context</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>Image resolution</topic><topic>light field</topic><topic>Light fields</topic><topic>Modelling</topic><topic>multi-scale angular modeling</topic><topic>Pixels</topic><topic>Sampling</topic><topic>Scene analysis</topic><topic>Spatial data</topic><topic>Spatial resolution</topic><topic>sub-sampling spatial modeling</topic><topic>super-resolution</topic><topic>Superresolution</topic><topic>transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cong, Ruixuan</creatorcontrib><creatorcontrib>Sheng, Hao</creatorcontrib><creatorcontrib>Yang, Da</creatorcontrib><creatorcontrib>Cui, Zhenglong</creatorcontrib><creatorcontrib>Chen, Rongshan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cong, Ruixuan</au><au>Sheng, Hao</au><au>Yang, Da</au><au>Cui, Zhenglong</au><au>Chen, Rongshan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>26</volume><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3282465</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-2811-8962</orcidid><orcidid>https://orcid.org/0000-0001-5782-894X</orcidid><orcidid>https://orcid.org/0000-0003-4796-6382</orcidid><orcidid>https://orcid.org/0000-0001-6410-5248</orcidid><orcidid>https://orcid.org/0009-0003-7296-0545</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2024-01, Vol.26, p.1-14
issn	1520-9210 1941-0077
language	eng
recordid	cdi_proquest_journals_2916477054
source	IEEE Electronic Library (IEL)
subjects	Aperture imaging Computational modeling Context Convolution Feature extraction Image resolution light field Light fields Modelling multi-scale angular modeling Pixels Sampling Scene analysis Spatial data Spatial resolution sub-sampling spatial modeling super-resolution Superresolution transformer Transformers
title	Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T11%3A10%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Spatial%20and%20Angular%20Correlations%20With%20Deep%20Efficient%20Transformers%20for%20Light%20Field%20Image%20Super-Resolution&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Cong,%20Ruixuan&rft.date=2024-01-01&rft.volume=26&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3282465&rft_dat=%3Cproquest_RIE%3E2916477054%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2916477054&rft_id=info:pmid/&rft_ieee_id=10143279&rfr_iscdi=true