Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Video-to-audio (V2A) generation leverages visual-only video features to render plausible sounds that match the scene. Importantly, the generated sound onsets should match the visual actions that are aligned with them, otherwise unnatural synchronization artifacts arise. Recent works have explored th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-07
Hauptverfasser:	Pascual, Santiago, Yeh, Chunghsin, Tsiamas, Ioannis, Serrà, Joan
Format:	Artikel
Sprache:	eng
Schlagworte:	Codec Image enhancement Image quality Matching Semantics Sound generators Synchronism Time synchronization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!