Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control

Current diffusion models create photorealistic images given a text prompt as input but struggle to correctly bind attributes mentioned in the text to the right objects in the image. This is evidenced by our novel image-graph alignment model called EPViT (Edge Prediction Vision Transformer) for the e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Trusca, Maria Mihaela, Nuyts, Wolf, Thomm, Jonathan, Honig, Robert, Hofmann, Thomas, Tuytelaars, Tinne, Moens, Marie-Francine
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!