Towards Accurate Generative Models of Video: A New Metric & Challenges
Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder t...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advances in deep generative models have lead to remarkable progress in
synthesizing high quality images. Following their successful application in
image processing and representation learning, an important next step is to
consider videos. Learning generative models of video is a much harder task,
requiring a model to capture the temporal dynamics of a scene, in addition to
the visual presentation of objects. While recent attempts at formulating
generative models of video have had some success, current progress is hampered
by (1) the lack of qualitative metrics that consider visual quality, temporal
coherence, and diversity of samples, and (2) the wide gap between purely
synthetic video data sets and challenging real-world data sets in terms of
complexity. To this extent we propose Fr\'{e}chet Video Distance (FVD), a new
metric for generative models of video, and StarCraft 2 Videos (SCV), a
benchmark of game play from custom starcraft 2 scenarios that challenge the
current capabilities of generative models of video. We contribute a large-scale
human study, which confirms that FVD correlates well with qualitative human
judgment of generated videos, and provide initial benchmark results on SCV. |
---|---|
DOI: | 10.48550/arxiv.1812.01717 |