Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Videos are created to express emotion, exchange information, and share experiences. Video synthesis has intrigued researchers for a long time. Despite the rapid progress driven by advances in visual synthesis, most existing studies focus on improving the frames' quality and the transitions betw...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Videos are created to express emotion, exchange information, and share
experiences. Video synthesis has intrigued researchers for a long time. Despite
the rapid progress driven by advances in visual synthesis, most existing
studies focus on improving the frames' quality and the transitions between
them, while little progress has been made in generating longer videos. In this
paper, we present a method that builds on 3D-VQGAN and transformers to generate
videos with thousands of frames. Our evaluation shows that our model trained on
16-frame video clips from standard benchmarks such as UCF-101, Sky Time-lapse,
and Taichi-HD datasets can generate diverse, coherent, and high-quality long
videos. We also showcase conditional extensions of our approach for generating
meaningful long videos by incorporating temporal information with text and
audio. Videos and code can be found at
https://songweige.github.io/projects/tats/index.html. |
---|---|
DOI: | 10.48550/arxiv.2204.03638 |