SAMControl: Controlling Pose and Object for Image Editing with Soft Attention Mask
To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a reconstruction branch to capture the source image details via diffusion inversion and a generation branch to synthesize the target image based on the given textual prompt and the masked sourc...
Gespeichert in:
Veröffentlicht in: | ACM transactions on multimedia computing communications and applications 2024-11 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a reconstruction branch to capture the source image details via diffusion inversion and a generation branch to synthesize the target image based on the given textual prompt and the masked source image details. However, accurately segmenting source details is challenging with the current fixed-threshold mask strategy. Additionally, the inadequacies in the inversion process can lead to insufficient retention of source details. In this paper, we propose a method called SAMControl (Soft Attention Mask) to adaptively control the pose and object details for image editing. SAMControl dynamically learns flexible attention masks for different images at various diffusion steps. Furthermore, in the reconstruction branch, we utilize a direct inversion technique to ensure the fidelity of source details within SAM. Extensive qualitative and quantitative results demonstrate the effectiveness of the proposed method. |
---|---|
ISSN: | 1551-6857 1551-6865 |
DOI: | 10.1145/3702999 |