Low Latency Real-Time Vocal Tract Length Normalization
Vocal Tract Length Normalization (VTLN) is a well established and successful technique for speaker normalization. It can be applied in the recognition stage, but the improvements are roughly doubled if the same algorithm is applied to the training data before building the acoustic model as well. The...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Vocal Tract Length Normalization (VTLN) is a well established and successful technique for speaker normalization. It can be applied in the recognition stage, but the improvements are roughly doubled if the same algorithm is applied to the training data before building the acoustic model as well. The most common implementation uses a few minutes of speech or more per speaker and the final result, even if the recognition was faster than real time has significant latency. In this work we address the following constraints: reduced amount of data per speaker in training and testing; reduced latency, with no latency as the ultimate goal. The experiments show that although these restrictions impact the performance improvements possible with VTLN, real-time implementation of VTLN is not only practical but highly desirable. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-540-30120-2_47 |