The Right Losses for the Right Gains: Improving the Semantic Consistency of Deep Text-to-Image Generation with Distribution-Sensitive Losses
One of the major challenges in training deep neural networks for text-to-image generation is the significant linguistic discrepancy between ground-truth captions of each image in most popular datasets. The large difference in the choice of words in such captions results in synthesizing images that a...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | One of the major challenges in training deep neural networks for
text-to-image generation is the significant linguistic discrepancy between
ground-truth captions of each image in most popular datasets. The large
difference in the choice of words in such captions results in synthesizing
images that are semantically dissimilar to each other and to their ground-truth
counterparts. Moreover, existing models either fail to generate the
fine-grained details of the image or require a huge number of parameters that
renders them inefficient for text-to-image synthesis. To fill this gap in the
literature, we propose using the contrastive learning approach with a novel
combination of two loss functions: fake-to-fake loss to increase the semantic
consistency between generated images of the same caption, and fake-to-real loss
to reduce the gap between the distributions of real images and fake ones. We
test this approach on two baseline models: SSAGAN and AttnGAN (with style
blocks to enhance the fine-grained details of the images.) Results show that
our approach improves the qualitative results on AttnGAN with style blocks on
the CUB dataset. Additionally, on the challenging COCO dataset, our approach
achieves competitive results against the state-of-the-art Lafite model,
outperforms the FID score of SSAGAN model by 44. |
---|---|
DOI: | 10.48550/arxiv.2312.10854 |