Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution
Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and an...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2024-01, Vol.26, p.1-14 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 14 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on multimedia |
container_volume | 26 |
creator | Cong, Ruixuan Sheng, Hao Yang, Da Cui, Zhenglong Chen, Rongshan |
description | Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling. |
doi_str_mv | 10.1109/TMM.2023.3282465 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2916477054</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10143279</ieee_id><sourcerecordid>2916477054</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</originalsourceid><addsrcrecordid>eNpNkM9LwzAUx4MoOKd3Dx4Cnjvzo2nT45ibDjYEN_EYsvZly-iamrSg_70p8-DpPR7fH7wPQveUTCglxdN2vZ4wwviEM8nSTFygES1SmhCS55dxF4wkBaPkGt2EcCSEpoLkIxTm323tbGebPd60urO6xrqp8LTZ97X2eOa8hzreXRPwp-0O-BmgxXNjbGmh6fDW6yYY50_gA44Tr-z-0OGFhbrCy5PeA970LfjkHYKr-yHoFl0ZXQe4-5tj9LGYb2evyertZTmbrpKSpaJLzM5QttOUSVOBKCUIbbSklZacsrLMGAedpZJVlSRFIQwTxY5xCryU8TWa8jF6POe23n31EDp1dL1vYqViBc3SPCdiUJGzqvQuBA9Gtd6etP9RlKgBrYpo1YBW_aGNloezxQLAP3nsZHnBfwHW5HYW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2916477054</pqid></control><display><type>article</type><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><source>IEEE Electronic Library (IEL)</source><creator>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</creator><creatorcontrib>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</creatorcontrib><description>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2023.3282465</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Aperture imaging ; Computational modeling ; Context ; Convolution ; Feature extraction ; Image resolution ; light field ; Light fields ; Modelling ; multi-scale angular modeling ; Pixels ; Sampling ; Scene analysis ; Spatial data ; Spatial resolution ; sub-sampling spatial modeling ; super-resolution ; Superresolution ; transformer ; Transformers</subject><ispartof>IEEE transactions on multimedia, 2024-01, Vol.26, p.1-14</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</cites><orcidid>0000-0002-2811-8962 ; 0000-0001-5782-894X ; 0000-0003-4796-6382 ; 0000-0001-6410-5248 ; 0009-0003-7296-0545</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10143279$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10143279$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cong, Ruixuan</creatorcontrib><creatorcontrib>Sheng, Hao</creatorcontrib><creatorcontrib>Yang, Da</creatorcontrib><creatorcontrib>Cui, Zhenglong</creatorcontrib><creatorcontrib>Chen, Rongshan</creatorcontrib><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</description><subject>Aperture imaging</subject><subject>Computational modeling</subject><subject>Context</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>Image resolution</subject><subject>light field</subject><subject>Light fields</subject><subject>Modelling</subject><subject>multi-scale angular modeling</subject><subject>Pixels</subject><subject>Sampling</subject><subject>Scene analysis</subject><subject>Spatial data</subject><subject>Spatial resolution</subject><subject>sub-sampling spatial modeling</subject><subject>super-resolution</subject><subject>Superresolution</subject><subject>transformer</subject><subject>Transformers</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM9LwzAUx4MoOKd3Dx4Cnjvzo2nT45ibDjYEN_EYsvZly-iamrSg_70p8-DpPR7fH7wPQveUTCglxdN2vZ4wwviEM8nSTFygES1SmhCS55dxF4wkBaPkGt2EcCSEpoLkIxTm323tbGebPd60urO6xrqp8LTZ97X2eOa8hzreXRPwp-0O-BmgxXNjbGmh6fDW6yYY50_gA44Tr-z-0OGFhbrCy5PeA970LfjkHYKr-yHoFl0ZXQe4-5tj9LGYb2evyertZTmbrpKSpaJLzM5QttOUSVOBKCUIbbSklZacsrLMGAedpZJVlSRFIQwTxY5xCryU8TWa8jF6POe23n31EDp1dL1vYqViBc3SPCdiUJGzqvQuBA9Gtd6etP9RlKgBrYpo1YBW_aGNloezxQLAP3nsZHnBfwHW5HYW</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Cong, Ruixuan</creator><creator>Sheng, Hao</creator><creator>Yang, Da</creator><creator>Cui, Zhenglong</creator><creator>Chen, Rongshan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2811-8962</orcidid><orcidid>https://orcid.org/0000-0001-5782-894X</orcidid><orcidid>https://orcid.org/0000-0003-4796-6382</orcidid><orcidid>https://orcid.org/0000-0001-6410-5248</orcidid><orcidid>https://orcid.org/0009-0003-7296-0545</orcidid></search><sort><creationdate>20240101</creationdate><title>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</title><author>Cong, Ruixuan ; Sheng, Hao ; Yang, Da ; Cui, Zhenglong ; Chen, Rongshan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-fbf12ba128fde5c8e5afa81da8312cc623ea6482dd80995f259b231e3c8507143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Aperture imaging</topic><topic>Computational modeling</topic><topic>Context</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>Image resolution</topic><topic>light field</topic><topic>Light fields</topic><topic>Modelling</topic><topic>multi-scale angular modeling</topic><topic>Pixels</topic><topic>Sampling</topic><topic>Scene analysis</topic><topic>Spatial data</topic><topic>Spatial resolution</topic><topic>sub-sampling spatial modeling</topic><topic>super-resolution</topic><topic>Superresolution</topic><topic>transformer</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cong, Ruixuan</creatorcontrib><creatorcontrib>Sheng, Hao</creatorcontrib><creatorcontrib>Yang, Da</creatorcontrib><creatorcontrib>Cui, Zhenglong</creatorcontrib><creatorcontrib>Chen, Rongshan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cong, Ruixuan</au><au>Sheng, Hao</au><au>Yang, Da</au><au>Cui, Zhenglong</au><au>Chen, Rongshan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>26</volume><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Global context information is particularly important for comprehensive scene understanding. It helps clarify local confusions and smooth predictions to achieve fine-grained and coherent results. However, most existing light field processing methods leverage convolution layers to model spatial and angular information. The limited receptive field restricts them to learn long-range dependency in LF structure. In this paper, we propose a novel network based on deep efficient transformers ( i.e., LF-DET) for LF spatial super-resolution. It develops a spatial-angular separable transformer encoder with two modeling strategies termed as sub-sampling spatial modeling and multi-scale angular modeling for global context interaction. Specifically, the former utilizes a sub-sampling convolution layer to alleviate the problem of huge computational cost when capturing spatial information within each sub-aperture image. In this way, our model can cascade more transformers to continuously enhance feature representation with limited resources. The latter processes multi-scale macro-pixel regions to extract and aggregate angular features focusing on different disparity ranges to well adapt to disparity variations. Besides, we capture strong similarities among surrounding pixels by dynamic positional encodings to fill the gap of transformers that lack of local information interaction. The experimental results on both real-world and synthetic LF datasets confirm our LF-DET achieves a significant performance improvement compared with state-of-the-art methods. Furthermore, our LF-DET shows high robustness to disparity variations through the proposed multi-scale angular modeling.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2023.3282465</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-2811-8962</orcidid><orcidid>https://orcid.org/0000-0001-5782-894X</orcidid><orcidid>https://orcid.org/0000-0003-4796-6382</orcidid><orcidid>https://orcid.org/0000-0001-6410-5248</orcidid><orcidid>https://orcid.org/0009-0003-7296-0545</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-9210 |
ispartof | IEEE transactions on multimedia, 2024-01, Vol.26, p.1-14 |
issn | 1520-9210 1941-0077 |
language | eng |
recordid | cdi_proquest_journals_2916477054 |
source | IEEE Electronic Library (IEL) |
subjects | Aperture imaging Computational modeling Context Convolution Feature extraction Image resolution light field Light fields Modelling multi-scale angular modeling Pixels Sampling Scene analysis Spatial data Spatial resolution sub-sampling spatial modeling super-resolution Superresolution transformer Transformers |
title | Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T11%3A10%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploiting%20Spatial%20and%20Angular%20Correlations%20With%20Deep%20Efficient%20Transformers%20for%20Light%20Field%20Image%20Super-Resolution&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Cong,%20Ruixuan&rft.date=2024-01-01&rft.volume=26&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2023.3282465&rft_dat=%3Cproquest_RIE%3E2916477054%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2916477054&rft_id=info:pmid/&rft_ieee_id=10143279&rfr_iscdi=true |