OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system

We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kisun You, Youngjoon Lee, Wonyong Sung
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 624
container_issue
container_start_page 621
container_title
container_volume
creator Kisun You
Youngjoon Lee
Wonyong Sung
description We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.
doi_str_mv 10.1109/ICASSP.2009.4959660
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4959660</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4959660</ieee_id><sourcerecordid>4959660</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-b133e4f81e5b99e342f44e20c356d56446f5fe00ec40f6172e29da5fbb0467d03</originalsourceid><addsrcrecordid>eNpVkMtqwzAURNUXNKT5gmz0A0qvpCs5WpbQF6QkkBa6KUG2r1oVv7DsRfr1DTSbzmYWcxiYYWwuYSEluNvn1d1ut10oALdAZ5y1cMZmLltKVIhKGzTnbKJ05oR08H7xL9PLSzaRRoGwEt01m6X0DUeh0RLNhH1sOmpetiL3iUre-d5XFVU81l1FNTWDH2Lb8DZwz4u2GWIztmPiqSMqvnhPRfvZxB_q-RHyvB6rIYqi7YmnQxqovmFXwVeJZiefsreH-9fVk1hvHo-z1iLKzAwil1oThqUkkztHGlVAJAWFNrY0FtEGEwiACoRgZaZIudKbkOeANitBT9n8rzcS0b7rY-37w_50lv4F37paYw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Kisun You ; Youngjoon Lee ; Wonyong Sung</creator><creatorcontrib>Kisun You ; Youngjoon Lee ; Wonyong Sung</creatorcontrib><description>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424423538</identifier><identifier>ISBN: 1424423538</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424423545</identifier><identifier>EISBN: 1424423546</identifier><identifier>DOI: 10.1109/ICASSP.2009.4959660</identifier><language>eng</language><publisher>IEEE</publisher><subject>Distributed computing ; Libraries ; Load management ; OpenMP ; Parallel processing ; Parallelization ; Real time systems ; Scalability ; Speech recognition ; Testing ; Viterbi algorithm ; Yarn</subject><ispartof>2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.621-624</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4959660$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4959660$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kisun You</creatorcontrib><creatorcontrib>Youngjoon Lee</creatorcontrib><creatorcontrib>Wonyong Sung</creatorcontrib><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><title>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</description><subject>Distributed computing</subject><subject>Libraries</subject><subject>Load management</subject><subject>OpenMP</subject><subject>Parallel processing</subject><subject>Parallelization</subject><subject>Real time systems</subject><subject>Scalability</subject><subject>Speech recognition</subject><subject>Testing</subject><subject>Viterbi algorithm</subject><subject>Yarn</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424423538</isbn><isbn>1424423538</isbn><isbn>9781424423545</isbn><isbn>1424423546</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkMtqwzAURNUXNKT5gmz0A0qvpCs5WpbQF6QkkBa6KUG2r1oVv7DsRfr1DTSbzmYWcxiYYWwuYSEluNvn1d1ut10oALdAZ5y1cMZmLltKVIhKGzTnbKJ05oR08H7xL9PLSzaRRoGwEt01m6X0DUeh0RLNhH1sOmpetiL3iUre-d5XFVU81l1FNTWDH2Lb8DZwz4u2GWIztmPiqSMqvnhPRfvZxB_q-RHyvB6rIYqi7YmnQxqovmFXwVeJZiefsreH-9fVk1hvHo-z1iLKzAwil1oThqUkkztHGlVAJAWFNrY0FtEGEwiACoRgZaZIudKbkOeANitBT9n8rzcS0b7rY-37w_50lv4F37paYw</recordid><startdate>200904</startdate><enddate>200904</enddate><creator>Kisun You</creator><creator>Youngjoon Lee</creator><creator>Wonyong Sung</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200904</creationdate><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><author>Kisun You ; Youngjoon Lee ; Wonyong Sung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-b133e4f81e5b99e342f44e20c356d56446f5fe00ec40f6172e29da5fbb0467d03</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Distributed computing</topic><topic>Libraries</topic><topic>Load management</topic><topic>OpenMP</topic><topic>Parallel processing</topic><topic>Parallelization</topic><topic>Real time systems</topic><topic>Scalability</topic><topic>Speech recognition</topic><topic>Testing</topic><topic>Viterbi algorithm</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Kisun You</creatorcontrib><creatorcontrib>Youngjoon Lee</creatorcontrib><creatorcontrib>Wonyong Sung</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kisun You</au><au>Youngjoon Lee</au><au>Wonyong Sung</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</atitle><btitle>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2009-04</date><risdate>2009</risdate><spage>621</spage><epage>624</epage><pages>621-624</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424423538</isbn><isbn>1424423538</isbn><eisbn>9781424423545</eisbn><eisbn>1424423546</eisbn><abstract>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2009.4959660</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.621-624
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_4959660
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Distributed computing
Libraries
Load management
OpenMP
Parallel processing
Parallelization
Real time systems
Scalability
Speech recognition
Testing
Viterbi algorithm
Yarn
title OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A07%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=OpenMP-based%20parallel%20implementation%20of%20a%20continuous%20speech%20recognizer%20on%20a%20multi-core%20system&rft.btitle=2009%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Kisun%20You&rft.date=2009-04&rft.spage=621&rft.epage=624&rft.pages=621-624&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424423538&rft.isbn_list=1424423538&rft_id=info:doi/10.1109/ICASSP.2009.4959660&rft_dat=%3Cieee_6IE%3E4959660%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424423545&rft.eisbn_list=1424423546&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4959660&rfr_iscdi=true