Low Latency Real-Time Vocal Tract Length Normalization

Vocal Tract Length Normalization (VTLN) is a well established and successful technique for speaker normalization. It can be applied in the recognition stage, but the improvements are roughly doubled if the same algorithm is applied to the training data before building the acoustic model as well. The...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ljolje, Andrej, Goffin, Vincent, Saraclar, Murat
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Acoustic Model Applied sciences Artificial intelligence Baseline Model Computer science control theory systems Exact sciences and technology Full Search Speech and sound recognition and synthesis. Linguistics Speech Segment Word Error Rate
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Vocal Tract Length Normalization (VTLN) is a well established and successful technique for speaker normalization. It can be applied in the recognition stage, but the improvements are roughly doubled if the same algorithm is applied to the training data before building the acoustic model as well. The most common implementation uses a few minutes of speech or more per speaker and the final result, even if the recognition was faster than real time has significant latency. In this work we address the following constraints: reduced amount of data per speaker in training and testing; reduced latency, with no latency as the ultimate goal. The experiments show that although these restrictions impact the performance improvements possible with VTLN, real-time implementation of VTLN is not only practical but highly desirable.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-540-30120-2_47