Divide-and-Conquer Completion Network for Video Inpainting
Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the mi...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2023-06, Vol.33 (6), p.2753-2766 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2766 |
---|---|
container_issue | 6 |
container_start_page | 2753 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 33 |
creator | Wu, Zhiliang Sun, Changchang Xuan, Hanyu Zhang, Kang Yan, Yan |
description | Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations. |
doi_str_mv | 10.1109/TCSVT.2022.3225911 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9967838</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9967838</ieee_id><sourcerecordid>2823194321</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwA7CJxNrF40dis0PhVamCBaVby40d5NLawUlB_D2GVqxmFufOHR2EzoFMAIi6mtcvi_mEEkonjFKhAA7QCISQmFIiDvNOBGBJQRyjk75fEQJc8mqErm_9p7cOm2BxHcPH1qWijptu7QYfQ_Hkhq-Y3os2pmKRuVhMQ2d8GHx4O0VHrVn37mw_x-j1_m5eP-LZ88O0vpnhhioxYOaIc5wZs-SE2YpZJhnkF0tiOW8bYXhpoWSNsQqaylnj2kqWCgwsgZAMj9Hl7m6XYv6vH_QqblPIlZpKykBxRiFTdEc1KfZ9cq3ukt-Y9K2B6F9H-s-R_nWk945y6GIX8s65_4BSZSVz8Q9nvGHR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2823194321</pqid></control><display><type>article</type><title>Divide-and-Conquer Completion Network for Video Inpainting</title><source>IEEE Electronic Library (IEL)</source><creator>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</creator><creatorcontrib>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</creatorcontrib><description>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2022.3225911</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptation models ; Discrete Wavelet Transform ; Discrete wavelet transforms ; divide-and-conquer ; gradient-weighted reconstruction loss ; Image reconstruction ; Reconstruction ; Smoothness ; Task analysis ; Three-dimensional displays ; Transformers ; Transforms ; Video inpainting ; Wavelet transforms</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2023-06, Vol.33 (6), p.2753-2766</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</citedby><cites>FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</cites><orcidid>0000-0002-4633-2794 ; 0000-0001-8096-0914 ; 0000-0002-6597-8048</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9967838$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9967838$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wu, Zhiliang</creatorcontrib><creatorcontrib>Sun, Changchang</creatorcontrib><creatorcontrib>Xuan, Hanyu</creatorcontrib><creatorcontrib>Zhang, Kang</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><title>Divide-and-Conquer Completion Network for Video Inpainting</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</description><subject>Adaptation models</subject><subject>Discrete Wavelet Transform</subject><subject>Discrete wavelet transforms</subject><subject>divide-and-conquer</subject><subject>gradient-weighted reconstruction loss</subject><subject>Image reconstruction</subject><subject>Reconstruction</subject><subject>Smoothness</subject><subject>Task analysis</subject><subject>Three-dimensional displays</subject><subject>Transformers</subject><subject>Transforms</subject><subject>Video inpainting</subject><subject>Wavelet transforms</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwA7CJxNrF40dis0PhVamCBaVby40d5NLawUlB_D2GVqxmFufOHR2EzoFMAIi6mtcvi_mEEkonjFKhAA7QCISQmFIiDvNOBGBJQRyjk75fEQJc8mqErm_9p7cOm2BxHcPH1qWijptu7QYfQ_Hkhq-Y3os2pmKRuVhMQ2d8GHx4O0VHrVn37mw_x-j1_m5eP-LZ88O0vpnhhioxYOaIc5wZs-SE2YpZJhnkF0tiOW8bYXhpoWSNsQqaylnj2kqWCgwsgZAMj9Hl7m6XYv6vH_QqblPIlZpKykBxRiFTdEc1KfZ9cq3ukt-Y9K2B6F9H-s-R_nWk945y6GIX8s65_4BSZSVz8Q9nvGHR</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Wu, Zhiliang</creator><creator>Sun, Changchang</creator><creator>Xuan, Hanyu</creator><creator>Zhang, Kang</creator><creator>Yan, Yan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4633-2794</orcidid><orcidid>https://orcid.org/0000-0001-8096-0914</orcidid><orcidid>https://orcid.org/0000-0002-6597-8048</orcidid></search><sort><creationdate>20230601</creationdate><title>Divide-and-Conquer Completion Network for Video Inpainting</title><author>Wu, Zhiliang ; Sun, Changchang ; Xuan, Hanyu ; Zhang, Kang ; Yan, Yan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-3e0ee43aab403d73d383125960d44fc5a46d163cad91c7edaef78691a1b100383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptation models</topic><topic>Discrete Wavelet Transform</topic><topic>Discrete wavelet transforms</topic><topic>divide-and-conquer</topic><topic>gradient-weighted reconstruction loss</topic><topic>Image reconstruction</topic><topic>Reconstruction</topic><topic>Smoothness</topic><topic>Task analysis</topic><topic>Three-dimensional displays</topic><topic>Transformers</topic><topic>Transforms</topic><topic>Video inpainting</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zhiliang</creatorcontrib><creatorcontrib>Sun, Changchang</creatorcontrib><creatorcontrib>Xuan, Hanyu</creatorcontrib><creatorcontrib>Zhang, Kang</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Zhiliang</au><au>Sun, Changchang</au><au>Xuan, Hanyu</au><au>Zhang, Kang</au><au>Yan, Yan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Divide-and-Conquer Completion Network for Video Inpainting</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2023-06-01</date><risdate>2023</risdate><volume>33</volume><issue>6</issue><spage>2753</spage><epage>2766</epage><pages>2753-2766</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2022.3225911</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-4633-2794</orcidid><orcidid>https://orcid.org/0000-0001-8096-0914</orcidid><orcidid>https://orcid.org/0000-0002-6597-8048</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2023-06, Vol.33 (6), p.2753-2766 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_ieee_primary_9967838 |
source | IEEE Electronic Library (IEL) |
subjects | Adaptation models Discrete Wavelet Transform Discrete wavelet transforms divide-and-conquer gradient-weighted reconstruction loss Image reconstruction Reconstruction Smoothness Task analysis Three-dimensional displays Transformers Transforms Video inpainting Wavelet transforms |
title | Divide-and-Conquer Completion Network for Video Inpainting |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T17%3A40%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Divide-and-Conquer%20Completion%20Network%20for%20Video%20Inpainting&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Wu,%20Zhiliang&rft.date=2023-06-01&rft.volume=33&rft.issue=6&rft.spage=2753&rft.epage=2766&rft.pages=2753-2766&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2022.3225911&rft_dat=%3Cproquest_RIE%3E2823194321%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2823194321&rft_id=info:pmid/&rft_ieee_id=9967838&rfr_iscdi=true |