SDE-RAE:CLIP-based realistic image reconstruction and editing network using stochastic differential diffusion
Generative Adversarial Networks (GANs) has long dominated the field of image reconstruction and editing. It is capable to train a generator in an adversarial way, which can fool the discriminator and enable the generated image to be of high quality. However, this approach is often difficult to train...
Gespeichert in:
Veröffentlicht in: | Image and vision computing 2023-11, Vol.139, p.104836, Article 104836 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generative Adversarial Networks (GANs) has long dominated the field of image reconstruction and editing. It is capable to train a generator in an adversarial way, which can fool the discriminator and enable the generated image to be of high quality. However, this approach is often difficult to train, and the final result is hard to converge. Each different style of image requires construction of different datasets and complex optimization functions, and the training process is uncertain. To solve this problem, we propose a realistic image reconstruction and editing method based on Stochastic Differential Equation (SDE-RAE), where the diffusion model converts Gaussian noise to real photos by iterative denoising. What we only need to do is to construct simple loss functions in the reconstruction process to achieve high-quality image reconstruction, and we propose a novel semantic enhancement CLIP (Contrastive Language-Image Pre-Training) to interfere with the SDE parameter optimization direction in the editing process. Simple text is needed to achieve unique image editing. Our method generates high-quality images that retain the texture and contour features of the original image. Specifically, we manipulate the initial image, perturb the image by adding random noise, and then iteratively denoise the image by reverse SDE, manipulating the image's RGB pixels to achieve image reconstruction and editing. Code and dataset https://github.com/haizhu12/SDE-RAE.
[Display omitted]
•We propose a simple diffusion model-based training method that enables image reconstruction in various abstract styles.•We propose a CLIP-based semantic enhancement method that enables unique image editing.•The performance of our model outperforms the best GAN-based methods. Honggang Zhao et al. |
---|---|
ISSN: | 0262-8856 1872-8138 |
DOI: | 10.1016/j.imavis.2023.104836 |