Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images
Since the advent of GANs and VAEs, image generation models have continuously evolved, opening up various real-world applications with the introduction of Stable Diffusion and DALL-E models. These text-to-image models can generate high-quality images for fields such as art, design, and advertising. H...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Since the advent of GANs and VAEs, image generation models have continuously
evolved, opening up various real-world applications with the introduction of
Stable Diffusion and DALL-E models. These text-to-image models can generate
high-quality images for fields such as art, design, and advertising. However,
they often produce aberrant images for certain prompts. This study proposes a
method to mitigate such issues by fine-tuning the Stable Diffusion 3 model
using the DreamBooth technique. Experimental results targeting the prompt
"lying on the grass/street" demonstrate that the fine-tuned model shows
improved performance in visual evaluation and metrics such as Structural
Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Frechet
Inception Distance (FID). User surveys also indicated a higher preference for
the fine-tuned model. This research is expected to make contributions to
enhancing the practicality and reliability of text-to-image models. |
---|---|
DOI: | 10.48550/arxiv.2409.16174 |