MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. Despite achieving...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper studies the human image animation task, which aims to generate a
video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to
animate the reference image towards the target motion. Despite achieving
reasonable results, these approaches face challenges in maintaining temporal
consistency throughout the animation due to the lack of temporal modeling and
poor preservation of reference identity. In this work, we introduce
MagicAnimate, a diffusion-based framework that aims at enhancing temporal
consistency, preserving reference image faithfully, and improving animation
fidelity. To achieve this, we first develop a video diffusion model to encode
temporal information. Second, to maintain the appearance coherence across
frames, we introduce a novel appearance encoder to retain the intricate details
of the reference image. Leveraging these two innovations, we further employ a
simple video fusion technique to encourage smooth transitions for long video
animation. Empirical results demonstrate the superiority of our method over
baseline approaches on two benchmarks. Notably, our approach outperforms the
strongest baseline by over 38% in terms of video fidelity on the challenging
TikTok dancing dataset. Code and model will be made available. |
---|---|
DOI: | 10.48550/arxiv.2311.16498 |