DFU: scale-robust diffusion model for zero-shot super-resolution image generation
Diffusion generative models have achieved remarkable success in generating images with a fixed resolution. However, existing models have limited ability to generalize to different resolutions when training data at those resolutions are not available. Leveraging techniques from operator learning, we...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diffusion generative models have achieved remarkable success in generating
images with a fixed resolution. However, existing models have limited ability
to generalize to different resolutions when training data at those resolutions
are not available. Leveraging techniques from operator learning, we present a
novel deep-learning architecture, Dual-FNO UNet (DFU), which approximates the
score operator by combining both spatial and spectral information at multiple
resolutions. Comparisons of DFU to baselines demonstrate its scalability: 1)
simultaneously training on multiple resolutions improves FID over training at
any single fixed resolution; 2) DFU generalizes beyond its training
resolutions, allowing for coherent, high-fidelity generation at
higher-resolutions with the same model, i.e. zero-shot super-resolution
image-generation; 3) we propose a fine-tuning strategy to further enhance the
zero-shot super-resolution image-generation capability of our model, leading to
a FID of 11.3 at 1.66 times the maximum training resolution on FFHQ, which no
other method can come close to achieving. |
---|---|
DOI: | 10.48550/arxiv.2401.06144 |