A flat direct model for speech recognition

We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences of the output. This is different from the HMMs typically used for speech recognition. This conventional modeling approach is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Heigold, G., Zweig, G., Li, X., Nguyen, P.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Cellular phones Computer science Detectors Entropy Hidden Markov models language model maximum entropy Natural languages nearest neighbor Portals Speech recognition Testing Vocabulary voice search
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3864
container_issue
container_start_page	3861
container_title
container_volume
creator	Heigold, G. Zweig, G. Li, X. Nguyen, P.
description	We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences of the output. This is different from the HMMs typically used for speech recognition. This conventional modeling approach is based on sequential data and makes rigid assumptions on the dependences. HMMs have proven to be convenient and appropriate for large vocabulary continuous speech recognition. Our task under consideration, however, is the Windows Live Search for Mobile (WLS4M) task. This is a cellphone application that allows users to interact with web-based information portals. In particular, the set of valid outputs can be considered discrete and finite (although probably large, i.e., unseen events are an issue). Hence, a flat direct model lends itself to this task, making the adding of different knowledge sources and dependences straightforward and cheap. Using e.g. HMM posterior, m-gram, and spotter features, significant improvements over the conventional HMM system were observed.
doi_str_mv	10.1109/ICASSP.2009.4960470
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4960470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4960470</ieee_id><sourcerecordid>4960470</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-800a9f21de2a065b57b4f3d2de173889c3f4289d0c8ba4fc7bbff0a8e844333</originalsourceid><addsrcrecordid>eNpVjz9rwzAUxNV_UJP6E2TRXLD7JD1Z0hhCmxYCDThDtyBbT62KEwfbS799Dc3SWw7uB8cdY0sBpRDgnt7Wq7relRLAlegqQANXLHfGCpSIUmnU1yyTyrhCOPi4-ceUvWWZ0BKKSqC7Z_k4fsMs1EqgztjjisfOTzykgdqJH_tAHY_9wMczUfvF57T_PKUp9acHdhd9N1J-8QWrX57369di-76ZN26LJIyeCgvgXZQikPRQ6UabBqMKMpAwylrXqojSugCtbTzG1jRNjOAtWUSl1IIt_1oTER3OQzr64edw-a1-ASDKRp8</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A flat direct model for speech recognition</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Heigold, G. ; Zweig, G. ; Li, X. ; Nguyen, P.</creator><creatorcontrib>Heigold, G. ; Zweig, G. ; Li, X. ; Nguyen, P.</creatorcontrib><description>We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences of the output. This is different from the HMMs typically used for speech recognition. This conventional modeling approach is based on sequential data and makes rigid assumptions on the dependences. HMMs have proven to be convenient and appropriate for large vocabulary continuous speech recognition. Our task under consideration, however, is the Windows Live Search for Mobile (WLS4M) task. This is a cellphone application that allows users to interact with web-based information portals. In particular, the set of valid outputs can be considered discrete and finite (although probably large, i.e., unseen events are an issue). Hence, a flat direct model lends itself to this task, making the adding of different knowledge sources and dependences straightforward and cheap. Using e.g. HMM posterior, m-gram, and spotter features, significant improvements over the conventional HMM system were observed.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424423538</identifier><identifier>ISBN: 1424423538</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424423545</identifier><identifier>EISBN: 1424423546</identifier><identifier>DOI: 10.1109/ICASSP.2009.4960470</identifier><language>eng</language><publisher>IEEE</publisher><subject>Cellular phones ; Computer science ; Detectors ; Entropy ; Hidden Markov models ; language model ; maximum entropy ; Natural languages ; nearest neighbor ; Portals ; Speech recognition ; Testing ; Vocabulary ; voice search</subject><ispartof>2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.3861-3864</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4960470$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4960470$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Heigold, G.</creatorcontrib><creatorcontrib>Zweig, G.</creatorcontrib><creatorcontrib>Li, X.</creatorcontrib><creatorcontrib>Nguyen, P.</creatorcontrib><title>A flat direct model for speech recognition</title><title>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences of the output. This is different from the HMMs typically used for speech recognition. This conventional modeling approach is based on sequential data and makes rigid assumptions on the dependences. HMMs have proven to be convenient and appropriate for large vocabulary continuous speech recognition. Our task under consideration, however, is the Windows Live Search for Mobile (WLS4M) task. This is a cellphone application that allows users to interact with web-based information portals. In particular, the set of valid outputs can be considered discrete and finite (although probably large, i.e., unseen events are an issue). Hence, a flat direct model lends itself to this task, making the adding of different knowledge sources and dependences straightforward and cheap. Using e.g. HMM posterior, m-gram, and spotter features, significant improvements over the conventional HMM system were observed.</description><subject>Cellular phones</subject><subject>Computer science</subject><subject>Detectors</subject><subject>Entropy</subject><subject>Hidden Markov models</subject><subject>language model</subject><subject>maximum entropy</subject><subject>Natural languages</subject><subject>nearest neighbor</subject><subject>Portals</subject><subject>Speech recognition</subject><subject>Testing</subject><subject>Vocabulary</subject><subject>voice search</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424423538</isbn><isbn>1424423538</isbn><isbn>9781424423545</isbn><isbn>1424423546</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVjz9rwzAUxNV_UJP6E2TRXLD7JD1Z0hhCmxYCDThDtyBbT62KEwfbS799Dc3SWw7uB8cdY0sBpRDgnt7Wq7relRLAlegqQANXLHfGCpSIUmnU1yyTyrhCOPi4-ceUvWWZ0BKKSqC7Z_k4fsMs1EqgztjjisfOTzykgdqJH_tAHY_9wMczUfvF57T_PKUp9acHdhd9N1J-8QWrX57369di-76ZN26LJIyeCgvgXZQikPRQ6UabBqMKMpAwylrXqojSugCtbTzG1jRNjOAtWUSl1IIt_1oTER3OQzr64edw-a1-ASDKRp8</recordid><startdate>200904</startdate><enddate>200904</enddate><creator>Heigold, G.</creator><creator>Zweig, G.</creator><creator>Li, X.</creator><creator>Nguyen, P.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>200904</creationdate><title>A flat direct model for speech recognition</title><author>Heigold, G. ; Zweig, G. ; Li, X. ; Nguyen, P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-800a9f21de2a065b57b4f3d2de173889c3f4289d0c8ba4fc7bbff0a8e844333</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Cellular phones</topic><topic>Computer science</topic><topic>Detectors</topic><topic>Entropy</topic><topic>Hidden Markov models</topic><topic>language model</topic><topic>maximum entropy</topic><topic>Natural languages</topic><topic>nearest neighbor</topic><topic>Portals</topic><topic>Speech recognition</topic><topic>Testing</topic><topic>Vocabulary</topic><topic>voice search</topic><toplevel>online_resources</toplevel><creatorcontrib>Heigold, G.</creatorcontrib><creatorcontrib>Zweig, G.</creatorcontrib><creatorcontrib>Li, X.</creatorcontrib><creatorcontrib>Nguyen, P.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heigold, G.</au><au>Zweig, G.</au><au>Li, X.</au><au>Nguyen, P.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A flat direct model for speech recognition</atitle><btitle>2009 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2009-04</date><risdate>2009</risdate><spage>3861</spage><epage>3864</epage><pages>3861-3864</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424423538</isbn><isbn>1424423538</isbn><eisbn>9781424423545</eisbn><eisbn>1424423546</eisbn><abstract>We introduce a direct model for speech recognition that assumes an unstructured, i.e., flat text output. The flat model allows us to model arbitrary attributes and dependences of the output. This is different from the HMMs typically used for speech recognition. This conventional modeling approach is based on sequential data and makes rigid assumptions on the dependences. HMMs have proven to be convenient and appropriate for large vocabulary continuous speech recognition. Our task under consideration, however, is the Windows Live Search for Mobile (WLS4M) task. This is a cellphone application that allows users to interact with web-based information portals. In particular, the set of valid outputs can be considered discrete and finite (although probably large, i.e., unseen events are an issue). Hence, a flat direct model lends itself to this task, making the adding of different knowledge sources and dependences straightforward and cheap. Using e.g. HMM posterior, m-gram, and spotter features, significant improvements over the conventional HMM system were observed.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2009.4960470</doi><tpages>4</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-6149
ispartof	2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, p.3861-3864
issn	1520-6149 2379-190X
language	eng
recordid	cdi_ieee_primary_4960470
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Cellular phones Computer science Detectors Entropy Hidden Markov models language model maximum entropy Natural languages nearest neighbor Portals Speech recognition Testing Vocabulary voice search
title	A flat direct model for speech recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T18%3A27%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20flat%20direct%20model%20for%20speech%20recognition&rft.btitle=2009%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Heigold,%20G.&rft.date=2009-04&rft.spage=3861&rft.epage=3864&rft.pages=3861-3864&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424423538&rft.isbn_list=1424423538&rft_id=info:doi/10.1109/ICASSP.2009.4960470&rft_dat=%3Cieee_6IE%3E4960470%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424423545&rft.eisbn_list=1424423546&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4960470&rfr_iscdi=true