Divide-and-Conquer Completion Network for Video Inpainting

Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the mi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2023-06, Vol.33 (6), p.2753-2766
Hauptverfasser:	Wu, Zhiliang, Sun, Changchang, Xuan, Hanyu, Zhang, Kang, Yan, Yan
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Discrete Wavelet Transform Discrete wavelet transforms divide-and-conquer gradient-weighted reconstruction loss Image reconstruction Reconstruction Smoothness Task analysis Three-dimensional displays Transformers Transforms Video inpainting Wavelet transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2766
container_issue	6
container_start_page	2753
container_title	IEEE transactions on circuits and systems for video technology
container_volume	33
creator	Wu, Zhiliang Sun, Changchang Xuan, Hanyu Zhang, Kang Yan, Yan
description	Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.
doi_str_mv	10.1109/TCSVT.2022.3225911
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9967838</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9967838</ieee_id><sourcerecordid>2823194321</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwA7CJxNrF40dis0PhVamCBaVby40d5NLawUlB_D2GVqxmFufOHR2EzoFMAIi6mtcvi_mEEkonjFKhAA7QCISQmFIiDvNOBGBJQRyjk75fEQJc8mqErm_9p7cOm2BxHcPH1qWijptu7QYfQ_Hkhq-Y3os2pmKRuVhMQ2d8GHx4O0VHrVn37mw_x-j1_m5eP-LZ88O0vpnhhioxYOaIc5wZs-SE2YpZJhnkF0tiOW8bYXhpoWSNsQqaylnj2kqWCgwsgZAMj9Hl7m6XYv6vH_QqblPIlZpKykBxRiFTdEc1KfZ9cq3ukt-Y9K2B6F9H-s-R_nWk945y6GIX8s65_4BSZSVz8Q9nvGHR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823194321</pqid></control><display><type>article</type><title>Divide-and-Conquer Completion Network for Video Inpainting</title><source>IEEE Electronic Library (IEL)</source><creator>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</creator><creatorcontrib>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</creatorcontrib><description>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2022.3225911</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptation models ; Discrete Wavelet Transform ; Discrete wavelet transforms ; divide-and-conquer ; gradient-weighted reconstruction loss ; Image reconstruction ; Reconstruction ; Smoothness ; Task analysis ; Three-dimensional displays ; Transformers ; Transforms ; Video inpainting ; Wavelet transforms</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2023-06, Vol.33 (6), p.2753-2766</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</citedby><cites>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</cites><orcidid>0000-0002-4633-2794 ; 0000-0001-8096-0914 ; 0000-0002-6597-8048</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9967838$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9967838$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wu, Zhiliang</creatorcontrib><creatorcontrib>Sun, Changchang</creatorcontrib><creatorcontrib>Xuan, Hanyu</creatorcontrib><creatorcontrib>Zhang, Kang</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><title>Divide-and-Conquer Completion Network for Video Inpainting</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</description><subject>Adaptation models</subject><subject>Discrete Wavelet Transform</subject><subject>Discrete wavelet transforms</subject><subject>divide-and-conquer</subject><subject>gradient-weighted reconstruction loss</subject><subject>Image reconstruction</subject><subject>Reconstruction</subject><subject>Smoothness</subject><subject>Task analysis</subject><subject>Three-dimensional displays</subject><subject>Transformers</subject><subject>Transforms</subject><subject>Video inpainting</subject><subject>Wavelet transforms</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwA7CJxNrF40dis0PhVamCBaVby40d5NLawUlB_D2GVqxmFufOHR2EzoFMAIi6mtcvi_mEEkonjFKhAA7QCISQmFIiDvNOBGBJQRyjk75fEQJc8mqErm_9p7cOm2BxHcPH1qWijptu7QYfQ_Hkhq-Y3os2pmKRuVhMQ2d8GHx4O0VHrVn37mw_x-j1_m5eP-LZ88O0vpnhhioxYOaIc5wZs-SE2YpZJhnkF0tiOW8bYXhpoWSNsQqaylnj2kqWCgwsgZAMj9Hl7m6XYv6vH_QqblPIlZpKykBxRiFTdEc1KfZ9cq3ukt-Y9K2B6F9H-s-R_nWk945y6GIX8s65_4BSZSVz8Q9nvGHR</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Wu, Zhiliang</creator><creator>Sun, Changchang</creator><creator>Xuan, Hanyu</creator><creator>Zhang, Kang</creator><creator>Yan, Yan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4633-2794</orcidid><orcidid>https://orcid.org/0000-0001-8096-0914</orcidid><orcidid>https://orcid.org/0000-0002-6597-8048</orcidid></search><sort><creationdate>20230601</creationdate><title>Divide-and-Conquer Completion Network for Video Inpainting</title><author>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptation models</topic><topic>Discrete Wavelet Transform</topic><topic>Discrete wavelet transforms</topic><topic>divide-and-conquer</topic><topic>gradient-weighted reconstruction loss</topic><topic>Image reconstruction</topic><topic>Reconstruction</topic><topic>Smoothness</topic><topic>Task analysis</topic><topic>Three-dimensional displays</topic><topic>Transformers</topic><topic>Transforms</topic><topic>Video inpainting</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zhiliang</creatorcontrib><creatorcontrib>Sun, Changchang</creatorcontrib><creatorcontrib>Xuan, Hanyu</creatorcontrib><creatorcontrib>Zhang, Kang</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Zhiliang</au><au>Sun, Changchang</au><au>Xuan, Hanyu</au><au>Zhang, Kang</au><au>Yan, Yan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Divide-and-Conquer Completion Network for Video Inpainting</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2023-06-01</date><risdate>2023</risdate><volume>33</volume><issue>6</issue><spage>2753</spage><epage>2766</epage><pages>2753-2766</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2022.3225911</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-4633-2794</orcidid><orcidid>https://orcid.org/0000-0001-8096-0914</orcidid><orcidid>https://orcid.org/0000-0002-6597-8048</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2023-06, Vol.33 (6), p.2753-2766
issn	1051-8215 1558-2205
language	eng
recordid	cdi_ieee_primary_9967838
source	IEEE Electronic Library (IEL)
subjects	Adaptation models Discrete Wavelet Transform Discrete wavelet transforms divide-and-conquer gradient-weighted reconstruction loss Image reconstruction Reconstruction Smoothness Task analysis Three-dimensional displays Transformers Transforms Video inpainting Wavelet transforms
title	Divide-and-Conquer Completion Network for Video Inpainting
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T17%3A40%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Divide-and-Conquer%20Completion%20Network%20for%20Video%20Inpainting&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Wu,%20Zhiliang&rft.date=2023-06-01&rft.volume=33&rft.issue=6&rft.spage=2753&rft.epage=2766&rft.pages=2753-2766&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2022.3225911&rft_dat=%3Cproquest_RIE%3E2823194321%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823194321&rft_id=info:pmid/&rft_ieee_id=9967838&rfr_iscdi=true