Zero-shot High-fidelity and Pose-controllable Character Animation
Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amoun...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Image-to-video (I2V) generation aims to create a video sequence from a single
image, which requires high temporal coherence and visual fidelity. However,
existing approaches suffer from inconsistency of character appearances and poor
preservation of fine details. Moreover, they require a large amount of video
data for training, which can be computationally demanding. To address these
limitations, we propose PoseAnimate, a novel zero-shot I2V framework for
character animation. PoseAnimate contains three key components: 1) a Pose-Aware
Control Module (PACM) that incorporates diverse pose signals into text
embeddings, to preserve character-independent content and maintain precise
alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that
enhances temporal consistency and retains character identity and intricate
background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines
distinct feature perception abilities, improving animation fidelity by
decoupling the character and background. We also propose a Pose Alignment
Transition Algorithm (PATA) to ensure smooth action transition. Extensive
experiment results demonstrate that our approach outperforms the
state-of-the-art training-based methods in terms of character consistency and
detail fidelity. Moreover, it maintains a high level of temporal coherence
throughout the generated animations. |
---|---|
DOI: | 10.48550/arxiv.2404.13680 |