OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system

We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kisun You, Youngjoon Lee, Wonyong Sung
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Distributed computing Libraries Load management OpenMP Parallel processing Parallelization Real time systems Scalability Speech recognition Testing Viterbi algorithm Yarn
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	624
container_issue
container_start_page	621
container_title
container_volume
creator	Kisun You Youngjoon Lee Wonyong Sung
description	We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.
doi_str_mv	10.1109/ICASSP.2009.4959660
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4959660</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4959660</ieee_id><sourcerecordid>4959660</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-b133e4f81e5b99e342f44e20c356d56446f5fe00ec40f6172e29da5fbb0467d03</originalsourceid><addsrcrecordid>eNpVkMtqwzAURNUXNKT5gmz0A0qvpCs5WpbQF6QkkBa6KUG2r1oVv7DsRfr1DTSbzmYWcxiYYWwuYSEluNvn1d1ut10oALdAZ5y1cMZmLltKVIhKGzTnbKJ05oR08H7xL9PLSzaRRoGwEt01m6X0DUeh0RLNhH1sOmpetiL3iUre-d5XFVU81l1FNTWDH2Lb8DZwz4u2GWIztmPiqSMqvnhPRfvZxB_q-RHyvB6rIYqi7YmnQxqovmFXwVeJZiefsreH-9fVk1hvHo-z1iLKzAwil1oThqUkkztHGlVAJAWFNrY0FtEGEwiACoRgZaZIudKbkOeANitBT9n8rzcS0b7rY-37w_50lv4F37paYw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Kisun You ; Youngjoon Lee ; Wonyong Sung</creator><creatorcontrib>Kisun You ; Youngjoon Lee ; Wonyong Sung</creatorcontrib><description>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424423538</identifier><identifier>ISBN: 1424423538</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424423545</identifier><identifier>EISBN: 1424423546</identifier><identifier>DOI: 10.1109/ICASSP.2009.4959660</identifier><language>eng</language><publisher>IEEE</publisher><subject>Distributed computing ; Libraries ; Load management ; OpenMP ; Parallel processing ; Parallelization ; Real time systems ; Scalability ; Speech recognition ; Testing ; Viterbi algorithm ; Yarn</subject><ispartof>2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.621-624</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4959660$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4959660$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kisun You</creatorcontrib><creatorcontrib>Youngjoon Lee</creatorcontrib><creatorcontrib>Wonyong Sung</creatorcontrib><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><title>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</description><subject>Distributed computing</subject><subject>Libraries</subject><subject>Load management</subject><subject>OpenMP</subject><subject>Parallel processing</subject><subject>Parallelization</subject><subject>Real time systems</subject><subject>Scalability</subject><subject>Speech recognition</subject><subject>Testing</subject><subject>Viterbi algorithm</subject><subject>Yarn</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424423538</isbn><isbn>1424423538</isbn><isbn>9781424423545</isbn><isbn>1424423546</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkMtqwzAURNUXNKT5gmz0A0qvpCs5WpbQF6QkkBa6KUG2r1oVv7DsRfr1DTSbzmYWcxiYYWwuYSEluNvn1d1ut10oALdAZ5y1cMZmLltKVIhKGzTnbKJ05oR08H7xL9PLSzaRRoGwEt01m6X0DUeh0RLNhH1sOmpetiL3iUre-d5XFVU81l1FNTWDH2Lb8DZwz4u2GWIztmPiqSMqvnhPRfvZxB_q-RHyvB6rIYqi7YmnQxqovmFXwVeJZiefsreH-9fVk1hvHo-z1iLKzAwil1oThqUkkztHGlVAJAWFNrY0FtEGEwiACoRgZaZIudKbkOeANitBT9n8rzcS0b7rY-37w_50lv4F37paYw</recordid><startdate>200904</startdate><enddate>200904</enddate><creator>Kisun You</creator><creator>Youngjoon Lee</creator><creator>Wonyong Sung</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200904</creationdate><title>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</title><author>Kisun You ; Youngjoon Lee ; Wonyong Sung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-b133e4f81e5b99e342f44e20c356d56446f5fe00ec40f6172e29da5fbb0467d03</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Distributed computing</topic><topic>Libraries</topic><topic>Load management</topic><topic>OpenMP</topic><topic>Parallel processing</topic><topic>Parallelization</topic><topic>Real time systems</topic><topic>Scalability</topic><topic>Speech recognition</topic><topic>Testing</topic><topic>Viterbi algorithm</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Kisun You</creatorcontrib><creatorcontrib>Youngjoon Lee</creatorcontrib><creatorcontrib>Wonyong Sung</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kisun You</au><au>Youngjoon Lee</au><au>Wonyong Sung</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system</atitle><btitle>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2009-04</date><risdate>2009</risdate><spage>621</spage><epage>624</epage><pages>621-624</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424423538</isbn><isbn>1424423538</isbn><eisbn>9781424423545</eisbn><eisbn>1424423546</eisbn><abstract>We have implemented a 20,000-word continuous speech recognizer on a multi-core based system. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. In the emission probability computation, a dynamic workload distribution method is employed for good load balancing. However, the search network involved in the Viterbi beam search is statically partitioned into independent subtrees to reduce memory synchronization overhead. In order to further improve the performance, a workload predictive thread assignment strategy as well as a false cache line sharing prevention method are employed. The test was conducted using WSJ1 20 k test and development set. We achieved the speed-up of 3.90 by utilizing four threads parallelization in a four-core system compared to four copies of the baseline single thread speech recognizer running simultaneously. The final recognition system runs about twice the speed of the real-time requirement.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2009.4959660</doi><tpages>4</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-6149
ispartof	2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.621-624
issn	1520-6149 2379-190X
language	eng
recordid	cdi_ieee_primary_4959660
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Distributed computing Libraries Load management OpenMP Parallel processing Parallelization Real time systems Scalability Speech recognition Testing Viterbi algorithm Yarn
title	OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A07%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=OpenMP-based%20parallel%20implementation%20of%20a%20continuous%20speech%20recognizer%20on%20a%20multi-core%20system&rft.btitle=2009%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Kisun%20You&rft.date=2009-04&rft.spage=621&rft.epage=624&rft.pages=621-624&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424423538&rft.isbn_list=1424423538&rft_id=info:doi/10.1109/ICASSP.2009.4959660&rft_dat=%3Cieee_6IE%3E4959660%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424423545&rft.eisbn_list=1424423546&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4959660&rfr_iscdi=true