Semi-Supervised Video Inpainting with Cycle Consistency Constraints
Deep learning-based video inpainting has yielded promising results and gained increasing attention from researchers. Generally, these methods usually assume that the corrupted region masks of each frame are known and easily obtained. However, the annotation of these masks are labor-intensive and exp...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning-based video inpainting has yielded promising results and gained
increasing attention from researchers. Generally, these methods usually assume
that the corrupted region masks of each frame are known and easily obtained.
However, the annotation of these masks are labor-intensive and expensive, which
limits the practical application of current methods. Therefore, we expect to
relax this assumption by defining a new semi-supervised inpainting setting,
making the networks have the ability of completing the corrupted regions of the
whole video using the annotated mask of only one frame. Specifically, in this
work, we propose an end-to-end trainable framework consisting of completion
network and mask prediction network, which are designed to generate corrupted
contents of the current frame using the known mask and decide the regions to be
filled of the next frame, respectively. Besides, we introduce a cycle
consistency loss to regularize the training parameters of these two networks.
In this way, the completion network and the mask prediction network can
constrain each other, and hence the overall performance of the trained model
can be maximized. Furthermore, due to the natural existence of prior knowledge
(e.g., corrupted contents and clear borders), current video inpainting datasets
are not suitable in the context of semi-supervised video inpainting. Thus, we
create a new dataset by simulating the corrupted video of real-world scenarios.
Extensive experimental results are reported to demonstrate the superiority of
our model in the video inpainting task. Remarkably, although our model is
trained in a semi-supervised manner, it can achieve comparable performance as
fully-supervised methods. |
---|---|
DOI: | 10.48550/arxiv.2208.06807 |