Space-time video super-resolution via multi-scale feature interpolation and temporal feature fusion

The goal of Space-Time Video Super-Resolution (STVSR) is to simultaneously increase the spatial resolution and frame rate of low-resolution, low-frame-rate video. In response to the problem that the STVSR method does not fully consider the spatio-temporal correlation between successive video frames,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Signal, image and video processing image and video processing, 2024-11, Vol.18 (11), p.8279-8291
Hauptverfasser:	Yang, Caisong, Kong, Guangqian, Duan, Xun, Long, Huiyun, Zhao, Jian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Imaging Computer Science Datasets Formability Frames (data processing) Image Processing and Computer Vision Interpolation Multimedia Information Systems Original Paper Pattern Recognition and Graphics Signal,Image and Speech Processing Spacetime Spatial resolution Spatiotemporal data Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	8291
container_issue	11
container_start_page	8279
container_title	Signal, image and video processing
container_volume	18
creator	Yang, Caisong Kong, Guangqian Duan, Xun Long, Huiyun Zhao, Jian
description	The goal of Space-Time Video Super-Resolution (STVSR) is to simultaneously increase the spatial resolution and frame rate of low-resolution, low-frame-rate video. In response to the problem that the STVSR method does not fully consider the spatio-temporal correlation between successive video frames, which makes the video frame reconstruction results unsatisfactory, and the problem that the inference speed of large models is slow. This paper proposes a STVSR method based on Multi-Scale Feature Interpolation and Temporal Feature Fusion (MSITF). First, feature interpolation is performed in the low-resolution feature space to obtain the features corresponding to the missing frames. The feature is then enhanced using deformable convolution with the aim of obtaining a more accurate feature of the missing frames. Finally, the temporal alignment and global context learning of sequence frame features are performed by a temporal feature fusion module to fully extract and utilize the useful spatio-temporal information in adjacent frames, resulting in better quality of the reconstructed video frames. Extensive experiments on the benchmark datasets Vid4 and Vimeo-90k show that the proposed method achieves better qualitative and quantitative performance, with PSNR and SSIM on the Vid4 dataset improving by 0.8% and 1.9%, respectively, over the state-of-the-art two-stage method AdaCof+TTVSR, and MSITF improved by 1.2% and 2.5%, respectively, compared to single-stage method RSTT. The number of parameters decreased by 80.4% and 8.2% compared to the AdaCof+TTVSR and RSTT, respectively.We release our code at https://github.com/carpenterChina/MSITF.
doi_str_mv	10.1007/s11760-024-03469-7
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3104475733</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3104475733</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-6fb5306d3cd09e878dfc2452d020344be819dbf789dde39199b8be2117bc941b3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouKz7BzwVPEfz0TbJURa_YMGDeg5pMpUubVOTVPDfG7ey3pzLDDPPO8O8CF1Sck0JETeRUlETTFiJCS9rhcUJWlFZc0wFpafHmvBztIlxT3JwJmQtV8i-TMYCTt0AxWfnwBdxniDgANH3c-r8mNumGOY-dTha00PRgklzgKIbE4TJ9-ZAmdEVCYbJB9MfkXaOeXaBzlrTR9j85jV6u7973T7i3fPD0_Z2hy0jJOG6bSpOasetIwqkkK61rKyYIyy_VTYgqXJNK6RyDriiSjWyAZafb6wqacPX6GrZOwX_MUNMeu_nMOaTmlNSlqISnGeKLZQNPsYArZ5CN5jwpSnRP37qxU-d_dQHP7XIIr6IYobHdwh_q_9RfQN5Nnmm</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3104475733</pqid></control><display><type>article</type><title>Space-time video super-resolution via multi-scale feature interpolation and temporal feature fusion</title><source>Springer Journals</source><creator>Yang, Caisong ; Kong, Guangqian ; Duan, Xun ; Long, Huiyun ; Zhao, Jian</creator><creatorcontrib>Yang, Caisong ; Kong, Guangqian ; Duan, Xun ; Long, Huiyun ; Zhao, Jian</creatorcontrib><description>The goal of Space-Time Video Super-Resolution (STVSR) is to simultaneously increase the spatial resolution and frame rate of low-resolution, low-frame-rate video. In response to the problem that the STVSR method does not fully consider the spatio-temporal correlation between successive video frames, which makes the video frame reconstruction results unsatisfactory, and the problem that the inference speed of large models is slow. This paper proposes a STVSR method based on Multi-Scale Feature Interpolation and Temporal Feature Fusion (MSITF). First, feature interpolation is performed in the low-resolution feature space to obtain the features corresponding to the missing frames. The feature is then enhanced using deformable convolution with the aim of obtaining a more accurate feature of the missing frames. Finally, the temporal alignment and global context learning of sequence frame features are performed by a temporal feature fusion module to fully extract and utilize the useful spatio-temporal information in adjacent frames, resulting in better quality of the reconstructed video frames. Extensive experiments on the benchmark datasets Vid4 and Vimeo-90k show that the proposed method achieves better qualitative and quantitative performance, with PSNR and SSIM on the Vid4 dataset improving by 0.8% and 1.9%, respectively, over the state-of-the-art two-stage method AdaCof+TTVSR, and MSITF improved by 1.2% and 2.5%, respectively, compared to single-stage method RSTT. The number of parameters decreased by 80.4% and 8.2% compared to the AdaCof+TTVSR and RSTT, respectively.We release our code at https://github.com/carpenterChina/MSITF.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03469-7</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Computer Imaging ; Computer Science ; Datasets ; Formability ; Frames (data processing) ; Image Processing and Computer Vision ; Interpolation ; Multimedia Information Systems ; Original Paper ; Pattern Recognition and Graphics ; Signal,Image and Speech Processing ; Spacetime ; Spatial resolution ; Spatiotemporal data ; Vision</subject><ispartof>Signal, image and video processing, 2024-11, Vol.18 (11), p.8279-8291</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-6fb5306d3cd09e878dfc2452d020344be819dbf789dde39199b8be2117bc941b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-024-03469-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-024-03469-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>315,781,785,27926,27927,41490,42559,51321</link.rule.ids></links><search><creatorcontrib>Yang, Caisong</creatorcontrib><creatorcontrib>Kong, Guangqian</creatorcontrib><creatorcontrib>Duan, Xun</creatorcontrib><creatorcontrib>Long, Huiyun</creatorcontrib><creatorcontrib>Zhao, Jian</creatorcontrib><title>Space-time video super-resolution via multi-scale feature interpolation and temporal feature fusion</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>The goal of Space-Time Video Super-Resolution (STVSR) is to simultaneously increase the spatial resolution and frame rate of low-resolution, low-frame-rate video. In response to the problem that the STVSR method does not fully consider the spatio-temporal correlation between successive video frames, which makes the video frame reconstruction results unsatisfactory, and the problem that the inference speed of large models is slow. This paper proposes a STVSR method based on Multi-Scale Feature Interpolation and Temporal Feature Fusion (MSITF). First, feature interpolation is performed in the low-resolution feature space to obtain the features corresponding to the missing frames. The feature is then enhanced using deformable convolution with the aim of obtaining a more accurate feature of the missing frames. Finally, the temporal alignment and global context learning of sequence frame features are performed by a temporal feature fusion module to fully extract and utilize the useful spatio-temporal information in adjacent frames, resulting in better quality of the reconstructed video frames. Extensive experiments on the benchmark datasets Vid4 and Vimeo-90k show that the proposed method achieves better qualitative and quantitative performance, with PSNR and SSIM on the Vid4 dataset improving by 0.8% and 1.9%, respectively, over the state-of-the-art two-stage method AdaCof+TTVSR, and MSITF improved by 1.2% and 2.5%, respectively, compared to single-stage method RSTT. The number of parameters decreased by 80.4% and 8.2% compared to the AdaCof+TTVSR and RSTT, respectively.We release our code at https://github.com/carpenterChina/MSITF.</description><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Formability</subject><subject>Frames (data processing)</subject><subject>Image Processing and Computer Vision</subject><subject>Interpolation</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Signal,Image and Speech Processing</subject><subject>Spacetime</subject><subject>Spatial resolution</subject><subject>Spatiotemporal data</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouKz7BzwVPEfz0TbJURa_YMGDeg5pMpUubVOTVPDfG7ey3pzLDDPPO8O8CF1Sck0JETeRUlETTFiJCS9rhcUJWlFZc0wFpafHmvBztIlxT3JwJmQtV8i-TMYCTt0AxWfnwBdxniDgANH3c-r8mNumGOY-dTha00PRgklzgKIbE4TJ9-ZAmdEVCYbJB9MfkXaOeXaBzlrTR9j85jV6u7973T7i3fPD0_Z2hy0jJOG6bSpOasetIwqkkK61rKyYIyy_VTYgqXJNK6RyDriiSjWyAZafb6wqacPX6GrZOwX_MUNMeu_nMOaTmlNSlqISnGeKLZQNPsYArZ5CN5jwpSnRP37qxU-d_dQHP7XIIr6IYobHdwh_q_9RfQN5Nnmm</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Yang, Caisong</creator><creator>Kong, Guangqian</creator><creator>Duan, Xun</creator><creator>Long, Huiyun</creator><creator>Zhao, Jian</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241101</creationdate><title>Space-time video super-resolution via multi-scale feature interpolation and temporal feature fusion</title><author>Yang, Caisong ; Kong, Guangqian ; Duan, Xun ; Long, Huiyun ; Zhao, Jian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-6fb5306d3cd09e878dfc2452d020344be819dbf789dde39199b8be2117bc941b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Formability</topic><topic>Frames (data processing)</topic><topic>Image Processing and Computer Vision</topic><topic>Interpolation</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Signal,Image and Speech Processing</topic><topic>Spacetime</topic><topic>Spatial resolution</topic><topic>Spatiotemporal data</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Caisong</creatorcontrib><creatorcontrib>Kong, Guangqian</creatorcontrib><creatorcontrib>Duan, Xun</creatorcontrib><creatorcontrib>Long, Huiyun</creatorcontrib><creatorcontrib>Zhao, Jian</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Caisong</au><au>Kong, Guangqian</au><au>Duan, Xun</au><au>Long, Huiyun</au><au>Zhao, Jian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Space-time video super-resolution via multi-scale feature interpolation and temporal feature fusion</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>18</volume><issue>11</issue><spage>8279</spage><epage>8291</epage><pages>8279-8291</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>The goal of Space-Time Video Super-Resolution (STVSR) is to simultaneously increase the spatial resolution and frame rate of low-resolution, low-frame-rate video. In response to the problem that the STVSR method does not fully consider the spatio-temporal correlation between successive video frames, which makes the video frame reconstruction results unsatisfactory, and the problem that the inference speed of large models is slow. This paper proposes a STVSR method based on Multi-Scale Feature Interpolation and Temporal Feature Fusion (MSITF). First, feature interpolation is performed in the low-resolution feature space to obtain the features corresponding to the missing frames. The feature is then enhanced using deformable convolution with the aim of obtaining a more accurate feature of the missing frames. Finally, the temporal alignment and global context learning of sequence frame features are performed by a temporal feature fusion module to fully extract and utilize the useful spatio-temporal information in adjacent frames, resulting in better quality of the reconstructed video frames. Extensive experiments on the benchmark datasets Vid4 and Vimeo-90k show that the proposed method achieves better qualitative and quantitative performance, with PSNR and SSIM on the Vid4 dataset improving by 0.8% and 1.9%, respectively, over the state-of-the-art two-stage method AdaCof+TTVSR, and MSITF improved by 1.2% and 2.5%, respectively, compared to single-stage method RSTT. The number of parameters decreased by 80.4% and 8.2% compared to the AdaCof+TTVSR and RSTT, respectively.We release our code at https://github.com/carpenterChina/MSITF.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03469-7</doi><tpages>13</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1863-1703
ispartof	Signal, image and video processing, 2024-11, Vol.18 (11), p.8279-8291
issn	1863-1703 1863-1711
language	eng
recordid	cdi_proquest_journals_3104475733
source	Springer Journals
subjects	Computer Imaging Computer Science Datasets Formability Frames (data processing) Image Processing and Computer Vision Interpolation Multimedia Information Systems Original Paper Pattern Recognition and Graphics Signal,Image and Speech Processing Spacetime Spatial resolution Spatiotemporal data Vision
title	Space-time video super-resolution via multi-scale feature interpolation and temporal feature fusion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T21%3A02%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Space-time%20video%20super-resolution%20via%20multi-scale%20feature%20interpolation%20and%20temporal%20feature%20fusion&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Yang,%20Caisong&rft.date=2024-11-01&rft.volume=18&rft.issue=11&rft.spage=8279&rft.epage=8291&rft.pages=8279-8291&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03469-7&rft_dat=%3Cproquest_cross%3E3104475733%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3104475733&rft_id=info:pmid/&rfr_iscdi=true