A Pytorch Reproduction of Masked Generative Image Transformer
In this technical report, we present a reproduction of MaskGIT: Masked Generative Image Transformer, using PyTorch. The approach involves leveraging a masked bidirectional transformer architecture, enabling image generation with only few steps (8~16 steps) for 512 x 512 resolution images, i.e., ~64x...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this technical report, we present a reproduction of MaskGIT: Masked
Generative Image Transformer, using PyTorch. The approach involves leveraging a
masked bidirectional transformer architecture, enabling image generation with
only few steps (8~16 steps) for 512 x 512 resolution images, i.e., ~64x faster
than an auto-regressive approach. Through rigorous experimentation and
optimization, we achieved results that closely align with the findings
presented in the original paper. We match the reported FID of 7.32 with our
replication and obtain 7.59 with similar hyperparameters on ImageNet at
resolution 512 x 512. Moreover, we improve over the official implementation
with some minor hyperparameter tweaking, achieving FID of 7.26. At the lower
resolution of 256 x 256 pixels, our reimplementation scores 6.80, in comparison
to the original paper's 6.18. To promote further research on Masked Generative
Models and facilitate their reproducibility, we released our code and
pre-trained weights openly at https://github.com/valeoai/MaskGIT-pytorch/ |
---|---|
DOI: | 10.48550/arxiv.2310.14400 |