SAMControl: Controlling Pose and Object for Image Editing with Soft Attention Mask

To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a reconstruction branch to capture the source image details via diffusion inversion and a generation branch to synthesize the target image based on the given textual prompt and the masked sourc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on multimedia computing communications and applications 2024-11
Hauptverfasser:	Zhang, Yue, Wang, Chao, Fang, Feifei, Zhuge, Yunzhi, Fan, Hehe, Chang, Xiaojun, Deng, Cheng, Yang, Yi
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer vision Computing methodologies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To achieve content-consistent results in text-conditioned image editing, existing methods typically employ a reconstruction branch to capture the source image details via diffusion inversion and a generation branch to synthesize the target image based on the given textual prompt and the masked source image details. However, accurately segmenting source details is challenging with the current fixed-threshold mask strategy. Additionally, the inadequacies in the inversion process can lead to insufficient retention of source details. In this paper, we propose a method called SAMControl (Soft Attention Mask) to adaptively control the pose and object details for image editing. SAMControl dynamically learns flexible attention masks for different images at various diffusion steps. Furthermore, in the reconstruction branch, we utilize a direct inversion technique to ensure the fidelity of source details within SAM. Extensive qualitative and quantitative results demonstrate the effectiveness of the proposed method.
ISSN:	1551-6857 1551-6865
DOI:	10.1145/3702999