Audio-driven speaker video synthesis method and system fused with neural radiation field

The invention provides an audio-driven speaker video synthesis method and system fused with a neural radiation field, and the method comprises the steps: obtaining a video data set in an environment, randomly selecting the video data set in a period of time, and analyzing a video sequence and an aud...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	FENG SIWEI, ZHU YUEBING, LI YONGYUAN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS CALCULATING COMPUTING COUNTING IMAGE DATA PROCESSING OR GENERATION, IN GENERAL MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention provides an audio-driven speaker video synthesis method and system fused with a neural radiation field, and the method comprises the steps: obtaining a video data set in an environment, randomly selecting the video data set in a period of time, and analyzing a video sequence and an audio sequence from the video data set; extracting face features from the video sequence and extracting audio features from the audio sequence; constructing an audio conditional implicit function F theta, putting the extracted face features and audio feature parameters into the constructed audio conditional implicit function F theta for training, and calculating the color value and volume density of the audio; and according to the color value and the volume density of the audio, using a volume rendering technology to render visual face and background information from the dynamic neural radiation field, and synthesizing a high-fidelity voice speaker video corresponding to the audio signal. According to the method, the