Voice Signal Processing System

Audio data is classified using voice descriptors (ie. desired style or voice quality) by extracting non-semantic speech features (eg. pitch, tone, volume, facial expression, gestures, body language) from vocal samples, generating a reduced representation of the vocal sample and then applying a train...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: André Ferreira Gil, Matthias Eichner, Felix Schaeffler
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Audio data is classified using voice descriptors (ie. desired style or voice quality) by extracting non-semantic speech features (eg. pitch, tone, volume, facial expression, gestures, body language) from vocal samples, generating a reduced representation of the vocal sample and then applying a trained model to predict the voice descriptor (eg. warm, professional, serious, conversational, direct, clear, friendly, natural, sensual, trustworthy, formal, see fig. 8). A second trained model may be applied to an intermediate representation based on a vocal profile (eg. vocal tract, muscular tension or phonetic features).