Unified Discrete Diffusion for Simultaneous Vision-Language Generation

The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the "modali...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hu, Minghui, Zheng, Chuanxia, Zheng, Heliang, Cham, Tat-Jen, Wang, Chaoyue, Yang, Zuopeng, Tao, Dacheng, Suganthan, Ponnuthurai N
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!