SYNTHESIZED SPEECH AUDIO DATA GENERATED ON BEHALF OF HUMAN PARTICIPANT IN CONVERSATION

Generating synthesized speech audio data on behalf of a given user in a conversation. The synthesized speech audio data includes synthesized speech that incorporates textual segment(s). The textual segment(s) can include recognized text that results from processing spoken input, of the given user, u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	ZADA, Nida, SEGUIN, Julie Anne, ALLEN, Brian F, BOWERS, Mark
Format:	Patent
Sprache:	eng ; fre
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Generating synthesized speech audio data on behalf of a given user in a conversation. The synthesized speech audio data includes synthesized speech that incorporates textual segment(s). The textual segment(s) can include recognized text that results from processing spoken input, of the given user, using a speech recognition model and/or can include a selection of a rendered suggestion that conveys the textual segment(s). Some implementations dynamically determine one or more prosodic properties for use in speech synthesis of the textual segment, and generate the synthesized speech with the one or more determined prosodic properties. The prosodic properties can be determined based on the textual segment(s) used in speech synthesis, textual segment(s) corresponding to recent spoken input of additional participant(s), attribute(s) of relationship(s) between the given user and additional participant(s) in the conversation, and/or feature(s) of a current location for the conversation. La présente invention concerne la génération de données audio de parole synthétisée pour le compte d'un utilisateur donné dans une conversation. Les données audio de parole synthétisée comprennent une parole synthétisée qui incorpore un ou plusieurs segments textuels. Lesdits segments textuels peuvent comprendre un texte reconnu qui résulte du traitement d'une entrée vocale, de l'utilisateur donné, en utilisant un modèle de reconnaissance vocale et/ou peut comprendre une sélection d'une suggestion rendue qui transporte lesdits segments textuels. Certains modes de réalisation déterminent dynamiquement une ou plusieurs propriétés prosodiques destinées à être utilisées dans la synthèse vocale du segment textuel, et génèrent la parole synthétisée avec lesdites propriétés prosodiques déterminées. Les propriétés prosodiques peuvent être déterminées sur la base desdits segments textuels utilisés dans la synthèse de la parole, lesdits segments textuels correspondant à une entrée vocale récente d'un ou de plusieurs participants supplémentaires, un ou plusieurs attributs de relations entre l'utilisateur donné et lesdits participants supplémentaires dans la conversation, et/ou une ou plusieurs caractéristiques d'un emplacement actuel pour la conversation.