Dynamic Frame Interpolation in Wavelet Domain

Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target fram...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2023, Vol.32, p.5296-5309
Hauptverfasser: Kong, Lingtong, Jiang, Boyuan, Luo, Donghao, Chu, Wenqing, Tai, Ying, Wang, Chengjie, Yang, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5309
container_issue
container_start_page 5296
container_title IEEE transactions on image processing
container_volume 32
creator Kong, Lingtong
Jiang, Boyuan
Luo, Donghao
Chu, Wenqing
Tai, Ying
Wang, Chengjie
Yang, Jie
description Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
doi_str_mv 10.1109/TIP.2023.3315151
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIP_2023_3315151</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10255621</ieee_id><sourcerecordid>2867395712</sourcerecordid><originalsourceid>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</originalsourceid><addsrcrecordid>eNpdkMFKAzEQQIMoVqt3Dx4WvHjZmkk2yeYordVCQQ8VjyHdTmDLbrYmW6F_b5b2IDKHGWbeDMMj5A7oBIDqp9XiY8Io4xPOQaQ4I1egC8gpLdh5qqlQuYJCj8h1jFtKoRAgL8mIK8WE4vyK5LODt21dZfNgW8wWvsew6xrb153Pap992R9ssM9mXWtrf0MunG0i3p7ymHzOX1bTt3z5_rqYPi_zihW0zzl1siycZg4BNQUKbuNsxWQagFyzTenWqQl0jVApxrHUDFUlOahCCtjwMXk83t2F7nuPsTdtHStsGuux20fDSimVBKFZQh_-odtuH3z6bqAU10LBQNEjVYUuxoDO7ELd2nAwQM2g0iSVZlBpTirTyv1xpUbEPzgTQjLgv40Daq4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2867395712</pqid></control><display><type>article</type><title>Dynamic Frame Interpolation in Wavelet Domain</title><source>IEEE Electronic Library (IEL)</source><creator>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</creator><creatorcontrib>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</creatorcontrib><description>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3315151</identifier><identifier>PMID: 37725733</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; adaptive inference ; Animation ; Computation ; Convolution ; Discrete wavelet transforms ; dynamic neural networks ; Dynamics ; high efficiency ; Image coding ; Interpolation ; Motion perception ; Optical flow (image analysis) ; Redundancy ; Synthesis ; Task analysis ; Video frame interpolation ; Wavelet domain ; wavelet transform</subject><ispartof>IEEE transactions on image processing, 2023, Vol.32, p.5296-5309</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</citedby><cites>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</cites><orcidid>0000-0003-2212-3581 ; 0000-0003-4801-7162 ; 0000-0003-4216-8090 ; 0000-0003-0816-7975</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10255621$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10255621$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kong, Lingtong</creatorcontrib><creatorcontrib>Jiang, Boyuan</creatorcontrib><creatorcontrib>Luo, Donghao</creatorcontrib><creatorcontrib>Chu, Wenqing</creatorcontrib><creatorcontrib>Tai, Ying</creatorcontrib><creatorcontrib>Wang, Chengjie</creatorcontrib><creatorcontrib>Yang, Jie</creatorcontrib><title>Dynamic Frame Interpolation in Wavelet Domain</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><description>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</description><subject>Accuracy</subject><subject>adaptive inference</subject><subject>Animation</subject><subject>Computation</subject><subject>Convolution</subject><subject>Discrete wavelet transforms</subject><subject>dynamic neural networks</subject><subject>Dynamics</subject><subject>high efficiency</subject><subject>Image coding</subject><subject>Interpolation</subject><subject>Motion perception</subject><subject>Optical flow (image analysis)</subject><subject>Redundancy</subject><subject>Synthesis</subject><subject>Task analysis</subject><subject>Video frame interpolation</subject><subject>Wavelet domain</subject><subject>wavelet transform</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkMFKAzEQQIMoVqt3Dx4WvHjZmkk2yeYordVCQQ8VjyHdTmDLbrYmW6F_b5b2IDKHGWbeDMMj5A7oBIDqp9XiY8Io4xPOQaQ4I1egC8gpLdh5qqlQuYJCj8h1jFtKoRAgL8mIK8WE4vyK5LODt21dZfNgW8wWvsew6xrb153Pap992R9ssM9mXWtrf0MunG0i3p7ymHzOX1bTt3z5_rqYPi_zihW0zzl1siycZg4BNQUKbuNsxWQagFyzTenWqQl0jVApxrHUDFUlOahCCtjwMXk83t2F7nuPsTdtHStsGuux20fDSimVBKFZQh_-odtuH3z6bqAU10LBQNEjVYUuxoDO7ELd2nAwQM2g0iSVZlBpTirTyv1xpUbEPzgTQjLgv40Daq4</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Kong, Lingtong</creator><creator>Jiang, Boyuan</creator><creator>Luo, Donghao</creator><creator>Chu, Wenqing</creator><creator>Tai, Ying</creator><creator>Wang, Chengjie</creator><creator>Yang, Jie</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2212-3581</orcidid><orcidid>https://orcid.org/0000-0003-4801-7162</orcidid><orcidid>https://orcid.org/0000-0003-4216-8090</orcidid><orcidid>https://orcid.org/0000-0003-0816-7975</orcidid></search><sort><creationdate>2023</creationdate><title>Dynamic Frame Interpolation in Wavelet Domain</title><author>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>adaptive inference</topic><topic>Animation</topic><topic>Computation</topic><topic>Convolution</topic><topic>Discrete wavelet transforms</topic><topic>dynamic neural networks</topic><topic>Dynamics</topic><topic>high efficiency</topic><topic>Image coding</topic><topic>Interpolation</topic><topic>Motion perception</topic><topic>Optical flow (image analysis)</topic><topic>Redundancy</topic><topic>Synthesis</topic><topic>Task analysis</topic><topic>Video frame interpolation</topic><topic>Wavelet domain</topic><topic>wavelet transform</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Lingtong</creatorcontrib><creatorcontrib>Jiang, Boyuan</creatorcontrib><creatorcontrib>Luo, Donghao</creatorcontrib><creatorcontrib>Chu, Wenqing</creatorcontrib><creatorcontrib>Tai, Ying</creatorcontrib><creatorcontrib>Wang, Chengjie</creatorcontrib><creatorcontrib>Yang, Jie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kong, Lingtong</au><au>Jiang, Boyuan</au><au>Luo, Donghao</au><au>Chu, Wenqing</au><au>Tai, Ying</au><au>Wang, Chengjie</au><au>Yang, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dynamic Frame Interpolation in Wavelet Domain</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><date>2023</date><risdate>2023</risdate><volume>32</volume><spage>5296</spage><epage>5309</epage><pages>5296-5309</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>37725733</pmid><doi>10.1109/TIP.2023.3315151</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2212-3581</orcidid><orcidid>https://orcid.org/0000-0003-4801-7162</orcidid><orcidid>https://orcid.org/0000-0003-4216-8090</orcidid><orcidid>https://orcid.org/0000-0003-0816-7975</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2023, Vol.32, p.5296-5309
issn 1057-7149
1941-0042
language eng
recordid cdi_crossref_primary_10_1109_TIP_2023_3315151
source IEEE Electronic Library (IEL)
subjects Accuracy
adaptive inference
Animation
Computation
Convolution
Discrete wavelet transforms
dynamic neural networks
Dynamics
high efficiency
Image coding
Interpolation
Motion perception
Optical flow (image analysis)
Redundancy
Synthesis
Task analysis
Video frame interpolation
Wavelet domain
wavelet transform
title Dynamic Frame Interpolation in Wavelet Domain
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T16%3A56%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dynamic%20Frame%20Interpolation%20in%20Wavelet%20Domain&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Kong,%20Lingtong&rft.date=2023&rft.volume=32&rft.spage=5296&rft.epage=5309&rft.pages=5296-5309&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3315151&rft_dat=%3Cproquest_RIE%3E2867395712%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2867395712&rft_id=info:pmid/37725733&rft_ieee_id=10255621&rfr_iscdi=true