GACnet-Text-to-Image Synthesis With Generative Models Using Attention Mechanisms With Contrastive Learning

The generation of high-quality images from textual descriptions is a challenging task in computer vision and natural language processing. The goal of text-to-image synthesis, a current topic of research, is to produce excellent images from written descriptions. This study proposes a hybrid approach...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.9572-9585
Hauptverfasser: Habib, Md. Ahsan, Wadud, Md. Anwar Hussen, Pinky, Lubna Yeasmin, Talukder, Mehedi Hasan, Rahman, Mohammad Motiur, Mridha, M. F., Okuyama, Yuichi, Shin, Jungpil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The generation of high-quality images from textual descriptions is a challenging task in computer vision and natural language processing. The goal of text-to-image synthesis, a current topic of research, is to produce excellent images from written descriptions. This study proposes a hybrid approach to evaluating a dataset consisting of various text-image pairs by efficiently combining conditional generative adversarial networks (C-GAN), attention mechanisms, and contrastive learning (C-GAN+ATT+CL). We suggest a two-step method to improve image quality that starts by utilizing generative adversarial networks (GANs) with attention mechanisms to create low-resolution images and then contrastive learning to improve. Contrastive learning modules train on a separate dataset of high-resolution pictures; GANs learn on datasets of low-resolution text and image pairs. The Conditional GAN with Attention Mechanism and Contrastive Learning Method provides state-of-the-art performance in terms of image quality, diversity, and visual realism, among the several methods. The results of this study demonstrate that the proposed approach works better than all other methods, achieving an Inception Score (IS) of 35.23, a Fréchet Inception Distance (FID) of 18.2, and an R-Precision of 89.14. Our findings demonstrate that our "C-GAN+ATT+CL" approach significantly improves image quality and diversity and offers exciting paths for further study.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3342866