BrushEdit: All-In-One Image Inpainting and Editing
Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods. However, current inversion-based approaches struggle with big modifications (e.g., adding or removing objects) due to the structured nature of inversion noise,...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Image editing has advanced significantly with the development of diffusion
models using both inversion-based and instruction-based methods. However,
current inversion-based approaches struggle with big modifications (e.g.,
adding or removing objects) due to the structured nature of inversion noise,
which hinders substantial changes. Meanwhile, instruction-based methods often
constrain users to black-box operations, limiting direct interaction for
specifying editing regions and intensity. To address these limitations, we
propose BrushEdit, a novel inpainting-based instruction-guided image editing
paradigm, which leverages multimodal large language models (MLLMs) and image
inpainting models to enable autonomous, user-friendly, and interactive
free-form instruction editing. Specifically, we devise a system enabling
free-form instruction editing by integrating MLLMs and a dual-branch image
inpainting model in an agent-cooperative framework to perform editing category
classification, main object identification, mask acquisition, and editing area
inpainting. Extensive experiments show that our framework effectively combines
MLLMs and inpainting models, achieving superior performance across seven
metrics including mask region preservation and editing effect coherence. |
---|---|
DOI: | 10.48550/arxiv.2412.10316 |