OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system
We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution m...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement. |
---|---|
ISSN: | 1520-6149 2379-190X |
DOI: | 10.1109/ICASSP.2009.4959660 |