A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC). Recent significant advancement in this field is based on the deve...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Image editing aims to edit the given synthetic or real image to meet the
specific requirements from users. It is widely studied in recent years as a
promising and challenging field of Artificial Intelligence Generative Content
(AIGC). Recent significant advancement in this field is based on the
development of text-to-image (T2I) diffusion models, which generate images
according to text prompts. These models demonstrate remarkable generative
capabilities and have become widely used tools for image editing. T2I-based
image editing methods significantly enhance editing performance and offer a
user-friendly interface for modifying content guided by multimodal inputs. In
this survey, we provide a comprehensive review of multimodal-guided image
editing techniques that leverage T2I diffusion models. First, we define the
scope of image editing from a holistic perspective and detail various control
signals and editing scenarios. We then propose a unified framework to formalize
the editing process, categorizing it into two primary algorithm families. This
framework offers a design space for users to achieve specific goals.
Subsequently, we present an in-depth analysis of each component within this
framework, examining the characteristics and applicable scenarios of different
combinations. Given that training-based methods learn to directly map the
source image to target one under user guidance, we discuss them separately, and
introduce injection schemes of source image in different scenarios.
Additionally, we review the application of 2D techniques to video editing,
highlighting solutions for inter-frame inconsistency. Finally, we discuss open
challenges in the field and suggest potential future research directions. We
keep tracing related works at
https://github.com/xinchengshuai/Awesome-Image-Editing. |
---|---|
DOI: | 10.48550/arxiv.2406.14555 |