VOICE PROCESSING DEVICE AND PROGRAM

Synthesis of emotional speech is realized while settings unique to each speaker is taken into account.A speech processing apparatus is provided in which, while face feature points are extracted from moving image data obtained by imaging a face of a speaker, for each frame, a first generation network...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: KAINUMA, Ken-ichi
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Synthesis of emotional speech is realized while settings unique to each speaker is taken into account.A speech processing apparatus is provided in which, while face feature points are extracted from moving image data obtained by imaging a face of a speaker, for each frame, a first generation network for generating face feature points of the corresponding frame on the basis of speech feature data extracted from uttered speech of the speaker for each frame is generated, and whether or not the first generation network is appropriate is evaluated using an identification network, then, a second generation network for generating the uttered speech from a plurality of uncertain settings including at least text representing utterance content of the uttered speech and information indicating emotions included in the uttered speech, a plurality of types of fixed settings which define speech quality of the speaker, and the face feature points generated by the first generation network evaluated as appropriate, is generated, and whether or not the second generation network is appropriate is evaluated using the above-described identification network.