Low Latency Real-Time Vocal Tract Length Normalization

Vocal Tract Length Normalization (VTLN) is a well established and successful technique for speaker normalization. It can be applied in the recognition stage, but the improvements are roughly doubled if the same algorithm is applied to the training data before building the acoustic model as well. The...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ljolje, Andrej, Goffin, Vincent, Saraclar, Murat
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Vocal Tract Length Normalization (VTLN) is a well established and successful technique for speaker normalization. It can be applied in the recognition stage, but the improvements are roughly doubled if the same algorithm is applied to the training data before building the acoustic model as well. The most common implementation uses a few minutes of speech or more per speaker and the final result, even if the recognition was faster than real time has significant latency. In this work we address the following constraints: reduced amount of data per speaker in training and testing; reduced latency, with no latency as the ultimate goal. The experiments show that although these restrictions impact the performance improvements possible with VTLN, real-time implementation of VTLN is not only practical but highly desirable.
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-540-30120-2_47