Low-Rank Active Learning for Generating Speech-Drive Human Face Animation

Emotion&speech-based human facial animation technique can be considered as a useful application in many artificial intelligent systems. Given a speech signal, the recognizer output a sequence of the phoneme and emotion pairs. Thereby, we calculate the sequence of viseme and expression pairs acco...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.38758-38764
Hauptverfasser:	Xu, Hui, Yu, Xiaoyang, Cheng, Yu, Xiao, Mengqiong, Yu, Yue
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Active learning Animation Emotion recognition Emotions Face recognition Facial animation Feature extraction feature selection Frames (data processing) Intelligent systems Learning low-rank Morphing Pairwise error probability Phonemes Speech recognition Video
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Emotion&speech-based human facial animation technique can be considered as a useful application in many artificial intelligent systems. Given a speech signal, the recognizer output a sequence of the phoneme and emotion pairs. Thereby, we calculate the sequence of viseme and expression pairs accordingly, which are subsequently transformed to a consistent and synchronous video describing facial animation. This article introduces a novel facial animation technique that can intelligently generates real human face animation videos by leveraging an emotional speech. More specifically, we first extract acoustic features sufficiently discriminative to the emotion and phoneme pairs. And the corresponding sequence of phoneme and emotion pairs are computed. Next, we propose a low-rank active learning paradigm for discovering multiple key facial frames that can best represent the above phoneme and emotion pairs in the feature subspace. We associate each phoneme and emotion pair with a key facial frame, based on which the well-known morphing technique fits the associated key facial frames to a smooth animated facial video. We focus on generating multiple transitional facial frames between pairwise temporally adjacent key ones. Experiments demonstrated that the synthesized facial videos look real, smooth, and synchronous with different male/female speeches.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3374777