OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system
We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution m...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 624 |
---|---|
container_issue | |
container_start_page | 621 |
container_title | |
container_volume | |
creator | Kisun You Youngjoon Lee Wonyong Sung |
description | We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement. |
doi_str_mv | 10.1109/ICASSP.2009.4959660 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4959660</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4959660</ieee_id><sourcerecordid>4959660</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-b133e4f81e5b99e342f44e20c356d56446f5fe00ec40f6172e29da5fbb0467d03</originalsourceid><addsrcrecordid>eNpVkMtqwzAURNUXNKT5gmz0A0qvpCs5WpbQF6QkkBa6KUG2r1oVv7DsRfr1DTSbzmYWcxiYYWwuYSEluNvn1d1ut10oALdAZ5y1cMZmLltKVIhKGzTnbKJ05oR08H7xL9PLSzaRRoGwEt01m6X0DUeh0RLNhH1sOmpetiL3iUre-d5XFVU81l1FNTWDH2Lb8DZwz4u2GWIztmPiqSMqvnhPRfvZxB_q-RHyvB6rIYqi7YmnQxqovmFXwVeJZiefsreH-9fVk1hvHo-z1iLKzAwil1oThqUkkztHGlVAJAWFNrY0FtEGEwiACoRgZaZIudKbkOeANitBT9n8rzcS0b7rY-37w_50lv4F37paYw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Kisun You ; Youngjoon Lee ; Wonyong Sung</creator><creatorcontrib>Kisun You ; Youngjoon Lee ; Wonyong Sung</creatorcontrib><description>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424423538</identifier><identifier>ISBN: 1424423538</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424423545</identifier><identifier>EISBN: 1424423546</identifier><identifier>DOI: 10.1109/ICASSP.2009.4959660</identifier><language>eng</language><publisher>IEEE</publisher><subject>Distributed computing ; Libraries ; Load management ; OpenMP ; Parallel processing ; Parallelization ; Real time systems ; Scalability ; Speech recognition ; Testing ; Viterbi algorithm ; Yarn</subject><ispartof>2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.621-624</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4959660$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4959660$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kisun You</creatorcontrib><creatorcontrib>Youngjoon Lee</creatorcontrib><creatorcontrib>Wonyong Sung</creatorcontrib><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><title>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</description><subject>Distributed computing</subject><subject>Libraries</subject><subject>Load management</subject><subject>OpenMP</subject><subject>Parallel processing</subject><subject>Parallelization</subject><subject>Real time systems</subject><subject>Scalability</subject><subject>Speech recognition</subject><subject>Testing</subject><subject>Viterbi algorithm</subject><subject>Yarn</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424423538</isbn><isbn>1424423538</isbn><isbn>9781424423545</isbn><isbn>1424423546</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkMtqwzAURNUXNKT5gmz0A0qvpCs5WpbQF6QkkBa6KUG2r1oVv7DsRfr1DTSbzmYWcxiYYWwuYSEluNvn1d1ut10oALdAZ5y1cMZmLltKVIhKGzTnbKJ05oR08H7xL9PLSzaRRoGwEt01m6X0DUeh0RLNhH1sOmpetiL3iUre-d5XFVU81l1FNTWDH2Lb8DZwz4u2GWIztmPiqSMqvnhPRfvZxB_q-RHyvB6rIYqi7YmnQxqovmFXwVeJZiefsreH-9fVk1hvHo-z1iLKzAwil1oThqUkkztHGlVAJAWFNrY0FtEGEwiACoRgZaZIudKbkOeANitBT9n8rzcS0b7rY-37w_50lv4F37paYw</recordid><startdate>200904</startdate><enddate>200904</enddate><creator>Kisun You</creator><creator>Youngjoon Lee</creator><creator>Wonyong Sung</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200904</creationdate><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><author>Kisun You ; Youngjoon Lee ; Wonyong Sung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-b133e4f81e5b99e342f44e20c356d56446f5fe00ec40f6172e29da5fbb0467d03</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Distributed computing</topic><topic>Libraries</topic><topic>Load management</topic><topic>OpenMP</topic><topic>Parallel processing</topic><topic>Parallelization</topic><topic>Real time systems</topic><topic>Scalability</topic><topic>Speech recognition</topic><topic>Testing</topic><topic>Viterbi algorithm</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Kisun You</creatorcontrib><creatorcontrib>Youngjoon Lee</creatorcontrib><creatorcontrib>Wonyong Sung</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kisun You</au><au>Youngjoon Lee</au><au>Wonyong Sung</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</atitle><btitle>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2009-04</date><risdate>2009</risdate><spage>621</spage><epage>624</epage><pages>621-624</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424423538</isbn><isbn>1424423538</isbn><eisbn>9781424423545</eisbn><eisbn>1424423546</eisbn><abstract>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2009.4959660</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.621-624 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_4959660 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Distributed computing Libraries Load management OpenMP Parallel processing Parallelization Real time systems Scalability Speech recognition Testing Viterbi algorithm Yarn |
title | OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A07%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=OpenMP-based%20parallel%20implementation%20of%20a%20continuous%20speech%20recognizer%20on%20a%20multi-core%20system&rft.btitle=2009%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Kisun%20You&rft.date=2009-04&rft.spage=621&rft.epage=624&rft.pages=621-624&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424423538&rft.isbn_list=1424423538&rft_id=info:doi/10.1109/ICASSP.2009.4959660&rft_dat=%3Cieee_6IE%3E4959660%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424423545&rft.eisbn_list=1424423546&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4959660&rfr_iscdi=true |