Divide-and-Conquer Completion Network for Video Inpainting

Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the mi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2023-06, Vol.33 (6), p.2753-2766
Hauptverfasser: Wu, Zhiliang, Sun, Changchang, Xuan, Hanyu, Zhang, Kang, Yan, Yan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2766
container_issue 6
container_start_page 2753
container_title IEEE transactions on circuits and systems for video technology
container_volume 33
creator Wu, Zhiliang
Sun, Changchang
Xuan, Hanyu
Zhang, Kang
Yan, Yan
description Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.
doi_str_mv 10.1109/TCSVT.2022.3225911
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9967838</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9967838</ieee_id><sourcerecordid>2823194321</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwA7CJxNrF40dis0PhVamCBaVby40d5NLawUlB_D2GVqxmFufOHR2EzoFMAIi6mtcvi_mEEkonjFKhAA7QCISQmFIiDvNOBGBJQRyjk75fEQJc8mqErm_9p7cOm2BxHcPH1qWijptu7QYfQ_Hkhq-Y3os2pmKRuVhMQ2d8GHx4O0VHrVn37mw_x-j1_m5eP-LZ88O0vpnhhioxYOaIc5wZs-SE2YpZJhnkF0tiOW8bYXhpoWSNsQqaylnj2kqWCgwsgZAMj9Hl7m6XYv6vH_QqblPIlZpKykBxRiFTdEc1KfZ9cq3ukt-Y9K2B6F9H-s-R_nWk945y6GIX8s65_4BSZSVz8Q9nvGHR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823194321</pqid></control><display><type>article</type><title>Divide-and-Conquer Completion Network for Video Inpainting</title><source>IEEE Electronic Library (IEL)</source><creator>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</creator><creatorcontrib>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</creatorcontrib><description>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2022.3225911</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptation models ; Discrete Wavelet Transform ; Discrete wavelet transforms ; divide-and-conquer ; gradient-weighted reconstruction loss ; Image reconstruction ; Reconstruction ; Smoothness ; Task analysis ; Three-dimensional displays ; Transformers ; Transforms ; Video inpainting ; Wavelet transforms</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2023-06, Vol.33 (6), p.2753-2766</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</citedby><cites>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</cites><orcidid>0000-0002-4633-2794 ; 0000-0001-8096-0914 ; 0000-0002-6597-8048</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9967838$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9967838$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wu, Zhiliang</creatorcontrib><creatorcontrib>Sun, Changchang</creatorcontrib><creatorcontrib>Xuan, Hanyu</creatorcontrib><creatorcontrib>Zhang, Kang</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><title>Divide-and-Conquer Completion Network for Video Inpainting</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</description><subject>Adaptation models</subject><subject>Discrete Wavelet Transform</subject><subject>Discrete wavelet transforms</subject><subject>divide-and-conquer</subject><subject>gradient-weighted reconstruction loss</subject><subject>Image reconstruction</subject><subject>Reconstruction</subject><subject>Smoothness</subject><subject>Task analysis</subject><subject>Three-dimensional displays</subject><subject>Transformers</subject><subject>Transforms</subject><subject>Video inpainting</subject><subject>Wavelet transforms</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwA7CJxNrF40dis0PhVamCBaVby40d5NLawUlB_D2GVqxmFufOHR2EzoFMAIi6mtcvi_mEEkonjFKhAA7QCISQmFIiDvNOBGBJQRyjk75fEQJc8mqErm_9p7cOm2BxHcPH1qWijptu7QYfQ_Hkhq-Y3os2pmKRuVhMQ2d8GHx4O0VHrVn37mw_x-j1_m5eP-LZ88O0vpnhhioxYOaIc5wZs-SE2YpZJhnkF0tiOW8bYXhpoWSNsQqaylnj2kqWCgwsgZAMj9Hl7m6XYv6vH_QqblPIlZpKykBxRiFTdEc1KfZ9cq3ukt-Y9K2B6F9H-s-R_nWk945y6GIX8s65_4BSZSVz8Q9nvGHR</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Wu, Zhiliang</creator><creator>Sun, Changchang</creator><creator>Xuan, Hanyu</creator><creator>Zhang, Kang</creator><creator>Yan, Yan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4633-2794</orcidid><orcidid>https://orcid.org/0000-0001-8096-0914</orcidid><orcidid>https://orcid.org/0000-0002-6597-8048</orcidid></search><sort><creationdate>20230601</creationdate><title>Divide-and-Conquer Completion Network for Video Inpainting</title><author>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptation models</topic><topic>Discrete Wavelet Transform</topic><topic>Discrete wavelet transforms</topic><topic>divide-and-conquer</topic><topic>gradient-weighted reconstruction loss</topic><topic>Image reconstruction</topic><topic>Reconstruction</topic><topic>Smoothness</topic><topic>Task analysis</topic><topic>Three-dimensional displays</topic><topic>Transformers</topic><topic>Transforms</topic><topic>Video inpainting</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zhiliang</creatorcontrib><creatorcontrib>Sun, Changchang</creatorcontrib><creatorcontrib>Xuan, Hanyu</creatorcontrib><creatorcontrib>Zhang, Kang</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Zhiliang</au><au>Sun, Changchang</au><au>Xuan, Hanyu</au><au>Zhang, Kang</au><au>Yan, Yan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Divide-and-Conquer Completion Network for Video Inpainting</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2023-06-01</date><risdate>2023</risdate><volume>33</volume><issue>6</issue><spage>2753</spage><epage>2766</epage><pages>2753-2766</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2022.3225911</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-4633-2794</orcidid><orcidid>https://orcid.org/0000-0001-8096-0914</orcidid><orcidid>https://orcid.org/0000-0002-6597-8048</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2023-06, Vol.33 (6), p.2753-2766
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_9967838
source IEEE Electronic Library (IEL)
subjects Adaptation models
Discrete Wavelet Transform
Discrete wavelet transforms
divide-and-conquer
gradient-weighted reconstruction loss
Image reconstruction
Reconstruction
Smoothness
Task analysis
Three-dimensional displays
Transformers
Transforms
Video inpainting
Wavelet transforms
title Divide-and-Conquer Completion Network for Video Inpainting
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T17%3A40%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Divide-and-Conquer%20Completion%20Network%20for%20Video%20Inpainting&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Wu,%20Zhiliang&rft.date=2023-06-01&rft.volume=33&rft.issue=6&rft.spage=2753&rft.epage=2766&rft.pages=2753-2766&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2022.3225911&rft_dat=%3Cproquest_RIE%3E2823194321%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823194321&rft_id=info:pmid/&rft_ieee_id=9967838&rfr_iscdi=true