EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
Diffusion models have significantly improved the performance of image editing. Existing methods realize various approaches to achieve high-quality image editing, including but not limited to text control, dragging operation, and mask-and-inpainting. Among these, instruction-based editing stands out...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diffusion models have significantly improved the performance of image
editing. Existing methods realize various approaches to achieve high-quality
image editing, including but not limited to text control, dragging operation,
and mask-and-inpainting. Among these, instruction-based editing stands out for
its convenience and effectiveness in following human instructions across
diverse scenarios. However, it still focuses on simple editing operations like
adding, replacing, or deleting, and falls short of understanding aspects of
world dynamics that convey the realistic dynamic nature in the physical world.
Therefore, this work, EditWorld, introduces a new editing task, namely
world-instructed image editing, which defines and categorizes the instructions
grounded by various world scenarios. We curate a new image editing dataset with
world instructions using a set of large pretrained models (e.g., GPT-3.5,
Video-LLava and SDXL). To enable sufficient simulation of world dynamics for
image editing, our EditWorld trains model in the curated dataset, and improves
instruction-following ability with designed post-edit strategy. Extensive
experiments demonstrate our method significantly outperforms existing editing
methods in this new task. Our dataset and code will be available at
https://github.com/YangLing0818/EditWorld |
---|---|
DOI: | 10.48550/arxiv.2405.14785 |