Recurrent Affine Transformation for Text-to-image Synthesis

Text-to-image synthesis aims to generate realistic images conditioned on text descriptions. To fuse text information into synthesized images, conditional affine transformations (CATs), such as conditional batch normalization (CBN) and conditional instance normalization (CIN), are usually used to pre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2024-01, Vol.26, p.1-11
Hauptverfasser: Ye, Senmao, Wang, Huan, Tan, Mingkui, Liu, Fei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text-to-image synthesis aims to generate realistic images conditioned on text descriptions. To fuse text information into synthesized images, conditional affine transformations (CATs), such as conditional batch normalization (CBN) and conditional instance normalization (CIN), are usually used to predict batch statistics of different layers. However, ordinary CAT blocks control the batch statistics independently disregarding the consistency among neighboring layers. To address the above issue, we propose a new fusion approach names recurrent affine transformation (RAT) for synthesizing images conditioned on text information. RAT connects all the CAT blocks with recurrent connections for explicitly fitting the temporal consistency between CAT blocks. To verify the effectiveness of RAT, we propose a novel visualization method to show how generative adversarial network (GAN) fuses conditional information. Our microscopic and macroscopic visualizations not only demonstrate the effectiveness of RAT but also turn out to be a useful perspective to analyze how GAN fuses conditional information. In addition, we propose a more stable spatial attention mechanism for the discriminator, which helps the text description to supervise the generator to synthesize more relevant image contents. Extensive experiments on the CUB, Oxford-102, and COCO datasets demonstrate the proposed model's superiority in comparison to state-of-the-art models. Our code is available on https://github.com/senmaoy/RAT-GAN .
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2023.3266607