A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces
Recent advancements in the domain of text-to-image synthesis have culminated in a multitude of enhancements pertaining to quality, fidelity, and diversity. Contemporary techniques enable the generation of highly intricate visuals which rapidly approach near-photorealistic quality. Nevertheless, as p...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advancements in the domain of text-to-image synthesis have culminated
in a multitude of enhancements pertaining to quality, fidelity, and diversity.
Contemporary techniques enable the generation of highly intricate visuals which
rapidly approach near-photorealistic quality. Nevertheless, as progress is
achieved, the complexity of these methodologies increases, consequently
intensifying the comprehension barrier between individuals within the field and
those external to it.
In an endeavor to mitigate this disparity, we propose a streamlined approach
for text-to-image generation, which encompasses both the training paradigm and
the sampling process. Despite its remarkable simplicity, our method yields
aesthetically pleasing images with few sampling iterations, allows for
intriguing ways for conditioning the model, and imparts advantages absent in
state-of-the-art techniques. To demonstrate the efficacy of this approach in
achieving outcomes comparable to existing works, we have trained a one-billion
parameter text-conditional model, which we refer to as "Paella". In the
interest of fostering future exploration in this field, we have made our source
code and models publicly accessible for the research community. |
---|---|
DOI: | 10.48550/arxiv.2211.07292 |