Dynamic Frame Interpolation in Wavelet Domain

Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target fram...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2023, Vol.32, p.5296-5309
Hauptverfasser:	Kong, Lingtong, Jiang, Boyuan, Luo, Donghao, Chu, Wenqing, Tai, Ying, Wang, Chengjie, Yang, Jie
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy adaptive inference Animation Computation Convolution Discrete wavelet transforms dynamic neural networks Dynamics high efficiency Image coding Interpolation Motion perception Optical flow (image analysis) Redundancy Synthesis Task analysis Video frame interpolation Wavelet domain wavelet transform
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	5309
container_issue
container_start_page	5296
container_title	IEEE transactions on image processing
container_volume	32
creator	Kong, Lingtong Jiang, Boyuan Luo, Donghao Chu, Wenqing Tai, Ying Wang, Chengjie Yang, Jie
description	Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
doi_str_mv	10.1109/TIP.2023.3315151
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIP_2023_3315151</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10255621</ieee_id><sourcerecordid>2867395712</sourcerecordid><originalsourceid>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</originalsourceid><addsrcrecordid>eNpdkMFKAzEQQIMoVqt3Dx4WvHjZmkk2yeYordVCQQ8VjyHdTmDLbrYmW6F_b5b2IDKHGWbeDMMj5A7oBIDqp9XiY8Io4xPOQaQ4I1egC8gpLdh5qqlQuYJCj8h1jFtKoRAgL8mIK8WE4vyK5LODt21dZfNgW8wWvsew6xrb153Pap992R9ssM9mXWtrf0MunG0i3p7ymHzOX1bTt3z5_rqYPi_zihW0zzl1siycZg4BNQUKbuNsxWQagFyzTenWqQl0jVApxrHUDFUlOahCCtjwMXk83t2F7nuPsTdtHStsGuux20fDSimVBKFZQh_-odtuH3z6bqAU10LBQNEjVYUuxoDO7ELd2nAwQM2g0iSVZlBpTirTyv1xpUbEPzgTQjLgv40Daq4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2867395712</pqid></control><display><type>article</type><title>Dynamic Frame Interpolation in Wavelet Domain</title><source>IEEE Electronic Library (IEL)</source><creator>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</creator><creatorcontrib>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</creatorcontrib><description>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3315151</identifier><identifier>PMID: 37725733</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; adaptive inference ; Animation ; Computation ; Convolution ; Discrete wavelet transforms ; dynamic neural networks ; Dynamics ; high efficiency ; Image coding ; Interpolation ; Motion perception ; Optical flow (image analysis) ; Redundancy ; Synthesis ; Task analysis ; Video frame interpolation ; Wavelet domain ; wavelet transform</subject><ispartof>IEEE transactions on image processing, 2023, Vol.32, p.5296-5309</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</citedby><cites>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</cites><orcidid>0000-0003-2212-3581 ; 0000-0003-4801-7162 ; 0000-0003-4216-8090 ; 0000-0003-0816-7975</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10255621$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10255621$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kong, Lingtong</creatorcontrib><creatorcontrib>Jiang, Boyuan</creatorcontrib><creatorcontrib>Luo, Donghao</creatorcontrib><creatorcontrib>Chu, Wenqing</creatorcontrib><creatorcontrib>Tai, Ying</creatorcontrib><creatorcontrib>Wang, Chengjie</creatorcontrib><creatorcontrib>Yang, Jie</creatorcontrib><title>Dynamic Frame Interpolation in Wavelet Domain</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><description>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</description><subject>Accuracy</subject><subject>adaptive inference</subject><subject>Animation</subject><subject>Computation</subject><subject>Convolution</subject><subject>Discrete wavelet transforms</subject><subject>dynamic neural networks</subject><subject>Dynamics</subject><subject>high efficiency</subject><subject>Image coding</subject><subject>Interpolation</subject><subject>Motion perception</subject><subject>Optical flow (image analysis)</subject><subject>Redundancy</subject><subject>Synthesis</subject><subject>Task analysis</subject><subject>Video frame interpolation</subject><subject>Wavelet domain</subject><subject>wavelet transform</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkMFKAzEQQIMoVqt3Dx4WvHjZmkk2yeYordVCQQ8VjyHdTmDLbrYmW6F_b5b2IDKHGWbeDMMj5A7oBIDqp9XiY8Io4xPOQaQ4I1egC8gpLdh5qqlQuYJCj8h1jFtKoRAgL8mIK8WE4vyK5LODt21dZfNgW8wWvsew6xrb153Pap992R9ssM9mXWtrf0MunG0i3p7ymHzOX1bTt3z5_rqYPi_zihW0zzl1siycZg4BNQUKbuNsxWQagFyzTenWqQl0jVApxrHUDFUlOahCCtjwMXk83t2F7nuPsTdtHStsGuux20fDSimVBKFZQh_-odtuH3z6bqAU10LBQNEjVYUuxoDO7ELd2nAwQM2g0iSVZlBpTirTyv1xpUbEPzgTQjLgv40Daq4</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Kong, Lingtong</creator><creator>Jiang, Boyuan</creator><creator>Luo, Donghao</creator><creator>Chu, Wenqing</creator><creator>Tai, Ying</creator><creator>Wang, Chengjie</creator><creator>Yang, Jie</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2212-3581</orcidid><orcidid>https://orcid.org/0000-0003-4801-7162</orcidid><orcidid>https://orcid.org/0000-0003-4216-8090</orcidid><orcidid>https://orcid.org/0000-0003-0816-7975</orcidid></search><sort><creationdate>2023</creationdate><title>Dynamic Frame Interpolation in Wavelet Domain</title><author>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>adaptive inference</topic><topic>Animation</topic><topic>Computation</topic><topic>Convolution</topic><topic>Discrete wavelet transforms</topic><topic>dynamic neural networks</topic><topic>Dynamics</topic><topic>high efficiency</topic><topic>Image coding</topic><topic>Interpolation</topic><topic>Motion perception</topic><topic>Optical flow (image analysis)</topic><topic>Redundancy</topic><topic>Synthesis</topic><topic>Task analysis</topic><topic>Video frame interpolation</topic><topic>Wavelet domain</topic><topic>wavelet transform</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Lingtong</creatorcontrib><creatorcontrib>Jiang, Boyuan</creatorcontrib><creatorcontrib>Luo, Donghao</creatorcontrib><creatorcontrib>Chu, Wenqing</creatorcontrib><creatorcontrib>Tai, Ying</creatorcontrib><creatorcontrib>Wang, Chengjie</creatorcontrib><creatorcontrib>Yang, Jie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kong, Lingtong</au><au>Jiang, Boyuan</au><au>Luo, Donghao</au><au>Chu, Wenqing</au><au>Tai, Ying</au><au>Wang, Chengjie</au><au>Yang, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dynamic Frame Interpolation in Wavelet Domain</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><date>2023</date><risdate>2023</risdate><volume>32</volume><spage>5296</spage><epage>5309</epage><pages>5296-5309</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>37725733</pmid><doi>10.1109/TIP.2023.3315151</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2212-3581</orcidid><orcidid>https://orcid.org/0000-0003-4801-7162</orcidid><orcidid>https://orcid.org/0000-0003-4216-8090</orcidid><orcidid>https://orcid.org/0000-0003-0816-7975</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2023, Vol.32, p.5296-5309
issn	1057-7149 1941-0042
language	eng
recordid	cdi_crossref_primary_10_1109_TIP_2023_3315151
source	IEEE Electronic Library (IEL)
subjects	Accuracy adaptive inference Animation Computation Convolution Discrete wavelet transforms dynamic neural networks Dynamics high efficiency Image coding Interpolation Motion perception Optical flow (image analysis) Redundancy Synthesis Task analysis Video frame interpolation Wavelet domain wavelet transform
title	Dynamic Frame Interpolation in Wavelet Domain
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T16%3A56%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dynamic%20Frame%20Interpolation%20in%20Wavelet%20Domain&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Kong,%20Lingtong&rft.date=2023&rft.volume=32&rft.spage=5296&rft.epage=5309&rft.pages=5296-5309&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3315151&rft_dat=%3Cproquest_RIE%3E2867395712%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2867395712&rft_id=info:pmid/37725733&rft_ieee_id=10255621&rfr_iscdi=true