CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided imag...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advancements in video generation have been remarkable, yet many
existing methods struggle with issues of consistency and poor text-video
alignment. Moreover, the field lacks effective techniques for text-guided video
inpainting, a stark contrast to the well-explored domain of text-guided image
inpainting. To this end, this paper proposes a novel text-guided video
inpainting model that achieves better consistency, controllability and
compatibility. Specifically, we introduce a simple but efficient motion capture
module to preserve motion consistency, and design an instance-aware region
selection instead of a random region selection to obtain better textual
controllability, and utilize a novel strategy to inject some personalized
models into our CoCoCo model and thus obtain better model compatibility.
Extensive experiments show that our model can generate high-quality video
clips. Meanwhile, our model shows better motion consistency, textual
controllability and model compatibility. More details are shown in
[cococozibojia.github.io](cococozibojia.github.io). |
---|---|
DOI: | 10.48550/arxiv.2403.12035 |