Divide-and-Conquer Completion Network for Video Inpainting
Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the mi...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2023-06, Vol.33 (6), p.2753-2766 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Video inpainting aims to utilize plausible contents to complete missing regions in the video. For different components, the reconstruction targets of missing regions are different,e.g.,smoothness preserving for flat regions, sharpening for edges and textures. Typically, existing methods treat the missing regions as a whole and holistically train the model by optimizing homogenous pixel-wise losses (e.g.,MSE). In this way, the trained models will be easily dominated and determined by flat regions, failing to infer realistic details (edges and textures) that are difficult to reconstruct but necessary for practical applications. In this paper, we propose a divide-and-conquer completion network for video inpainting. In particular, our network first uses discrete wavelet transform to decompose the deep features into low-frequency components containing structural information (flat regions) and high-frequency components involving detailed texture information. Thereafter, we feed these components into different branches and adopt the temporal attention feature aggregation module to generate missing contents, separately. It hence can realize flexible supervision utilizing the intermediate supervision learning strategy for each component, which has not been noticed and explored by current state-of-the-art video inpainting methods. Furthermore, we adopt a gradient-weighted reconstruction loss to supervise the completed frame reconstruction process, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed textures regions, making the model pay more attention to the complex detailed textures. Extensive experiments validate the superior performance of our divide-and-conquer model over state-of-the-art baselines in both quantitative and qualitative evaluations. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2022.3225911 |