Dynamic Frame Interpolation in Wavelet Domain
Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target fram...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on image processing 2023, Vol.32, p.5296-5309 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 5309 |
---|---|
container_issue | |
container_start_page | 5296 |
container_title | IEEE transactions on image processing |
container_volume | 32 |
creator | Kong, Lingtong Jiang, Boyuan Luo, Donghao Chu, Wenqing Tai, Ying Wang, Chengjie Yang, Jie |
description | Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts. |
doi_str_mv | 10.1109/TIP.2023.3315151 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIP_2023_3315151</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10255621</ieee_id><sourcerecordid>2867395712</sourcerecordid><originalsourceid>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</originalsourceid><addsrcrecordid>eNpdkMFKAzEQQIMoVqt3Dx4WvHjZmkk2yeYordVCQQ8VjyHdTmDLbrYmW6F_b5b2IDKHGWbeDMMj5A7oBIDqp9XiY8Io4xPOQaQ4I1egC8gpLdh5qqlQuYJCj8h1jFtKoRAgL8mIK8WE4vyK5LODt21dZfNgW8wWvsew6xrb153Pap992R9ssM9mXWtrf0MunG0i3p7ymHzOX1bTt3z5_rqYPi_zihW0zzl1siycZg4BNQUKbuNsxWQagFyzTenWqQl0jVApxrHUDFUlOahCCtjwMXk83t2F7nuPsTdtHStsGuux20fDSimVBKFZQh_-odtuH3z6bqAU10LBQNEjVYUuxoDO7ELd2nAwQM2g0iSVZlBpTirTyv1xpUbEPzgTQjLgv40Daq4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2867395712</pqid></control><display><type>article</type><title>Dynamic Frame Interpolation in Wavelet Domain</title><source>IEEE Electronic Library (IEL)</source><creator>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</creator><creatorcontrib>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</creatorcontrib><description>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3315151</identifier><identifier>PMID: 37725733</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; adaptive inference ; Animation ; Computation ; Convolution ; Discrete wavelet transforms ; dynamic neural networks ; Dynamics ; high efficiency ; Image coding ; Interpolation ; Motion perception ; Optical flow (image analysis) ; Redundancy ; Synthesis ; Task analysis ; Video frame interpolation ; Wavelet domain ; wavelet transform</subject><ispartof>IEEE transactions on image processing, 2023, Vol.32, p.5296-5309</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</citedby><cites>FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</cites><orcidid>0000-0003-2212-3581 ; 0000-0003-4801-7162 ; 0000-0003-4216-8090 ; 0000-0003-0816-7975</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10255621$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10255621$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kong, Lingtong</creatorcontrib><creatorcontrib>Jiang, Boyuan</creatorcontrib><creatorcontrib>Luo, Donghao</creatorcontrib><creatorcontrib>Chu, Wenqing</creatorcontrib><creatorcontrib>Tai, Ying</creatorcontrib><creatorcontrib>Wang, Chengjie</creatorcontrib><creatorcontrib>Yang, Jie</creatorcontrib><title>Dynamic Frame Interpolation in Wavelet Domain</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><description>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</description><subject>Accuracy</subject><subject>adaptive inference</subject><subject>Animation</subject><subject>Computation</subject><subject>Convolution</subject><subject>Discrete wavelet transforms</subject><subject>dynamic neural networks</subject><subject>Dynamics</subject><subject>high efficiency</subject><subject>Image coding</subject><subject>Interpolation</subject><subject>Motion perception</subject><subject>Optical flow (image analysis)</subject><subject>Redundancy</subject><subject>Synthesis</subject><subject>Task analysis</subject><subject>Video frame interpolation</subject><subject>Wavelet domain</subject><subject>wavelet transform</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkMFKAzEQQIMoVqt3Dx4WvHjZmkk2yeYordVCQQ8VjyHdTmDLbrYmW6F_b5b2IDKHGWbeDMMj5A7oBIDqp9XiY8Io4xPOQaQ4I1egC8gpLdh5qqlQuYJCj8h1jFtKoRAgL8mIK8WE4vyK5LODt21dZfNgW8wWvsew6xrb153Pap992R9ssM9mXWtrf0MunG0i3p7ymHzOX1bTt3z5_rqYPi_zihW0zzl1siycZg4BNQUKbuNsxWQagFyzTenWqQl0jVApxrHUDFUlOahCCtjwMXk83t2F7nuPsTdtHStsGuux20fDSimVBKFZQh_-odtuH3z6bqAU10LBQNEjVYUuxoDO7ELd2nAwQM2g0iSVZlBpTirTyv1xpUbEPzgTQjLgv40Daq4</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Kong, Lingtong</creator><creator>Jiang, Boyuan</creator><creator>Luo, Donghao</creator><creator>Chu, Wenqing</creator><creator>Tai, Ying</creator><creator>Wang, Chengjie</creator><creator>Yang, Jie</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2212-3581</orcidid><orcidid>https://orcid.org/0000-0003-4801-7162</orcidid><orcidid>https://orcid.org/0000-0003-4216-8090</orcidid><orcidid>https://orcid.org/0000-0003-0816-7975</orcidid></search><sort><creationdate>2023</creationdate><title>Dynamic Frame Interpolation in Wavelet Domain</title><author>Kong, Lingtong ; Jiang, Boyuan ; Luo, Donghao ; Chu, Wenqing ; Tai, Ying ; Wang, Chengjie ; Yang, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c240t-30f684f92fe1e90101fdfac2630f16b2d8fb10110be1c723e892e7c63174651d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>adaptive inference</topic><topic>Animation</topic><topic>Computation</topic><topic>Convolution</topic><topic>Discrete wavelet transforms</topic><topic>dynamic neural networks</topic><topic>Dynamics</topic><topic>high efficiency</topic><topic>Image coding</topic><topic>Interpolation</topic><topic>Motion perception</topic><topic>Optical flow (image analysis)</topic><topic>Redundancy</topic><topic>Synthesis</topic><topic>Task analysis</topic><topic>Video frame interpolation</topic><topic>Wavelet domain</topic><topic>wavelet transform</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Lingtong</creatorcontrib><creatorcontrib>Jiang, Boyuan</creatorcontrib><creatorcontrib>Luo, Donghao</creatorcontrib><creatorcontrib>Chu, Wenqing</creatorcontrib><creatorcontrib>Tai, Ying</creatorcontrib><creatorcontrib>Wang, Chengjie</creatorcontrib><creatorcontrib>Yang, Jie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kong, Lingtong</au><au>Jiang, Boyuan</au><au>Luo, Donghao</au><au>Chu, Wenqing</au><au>Tai, Ying</au><au>Wang, Chengjie</au><au>Yang, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Dynamic Frame Interpolation in Wavelet Domain</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><date>2023</date><risdate>2023</risdate><volume>32</volume><spage>5296</spage><epage>5309</epage><pages>5296-5309</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>37725733</pmid><doi>10.1109/TIP.2023.3315151</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2212-3581</orcidid><orcidid>https://orcid.org/0000-0003-4801-7162</orcidid><orcidid>https://orcid.org/0000-0003-4216-8090</orcidid><orcidid>https://orcid.org/0000-0003-0816-7975</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1057-7149 |
ispartof | IEEE transactions on image processing, 2023, Vol.32, p.5296-5309 |
issn | 1057-7149 1941-0042 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TIP_2023_3315151 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy adaptive inference Animation Computation Convolution Discrete wavelet transforms dynamic neural networks Dynamics high efficiency Image coding Interpolation Motion perception Optical flow (image analysis) Redundancy Synthesis Task analysis Video frame interpolation Wavelet domain wavelet transform |
title | Dynamic Frame Interpolation in Wavelet Domain |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T16%3A56%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Dynamic%20Frame%20Interpolation%20in%20Wavelet%20Domain&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Kong,%20Lingtong&rft.date=2023&rft.volume=32&rft.spage=5296&rft.epage=5309&rft.pages=5296-5309&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3315151&rft_dat=%3Cproquest_RIE%3E2867395712%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2867395712&rft_id=info:pmid/37725733&rft_ieee_id=10255621&rfr_iscdi=true |