Bayesian nonparametric language models

Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cop...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ying-Lan Chang, Jen-Tzung Chien
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	backoff smoothing Bayesian methods Bayesian nonparametrics Computational modeling Context Data models language model Smoothing methods Speech Speech recognition topic model
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	192
container_issue
container_start_page	188
container_title
container_volume
creator	Ying-Lan Chang Jen-Tzung Chien
description	Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.
doi_str_mv	10.1109/ISCSLP.2012.6423460
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6423460</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6423460</ieee_id><sourcerecordid>6423460</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-7da9b053e9440f2a0ff2a66b50c8dd0d7327f954ac7cd58a966978673c6f1d0e3</originalsourceid><addsrcrecordid>eNpFj0tLw0AUhUdEUGt_QTdZuUu887qTWWrwUQhYaF2X23mUkSQtmbrovzdgwc05nMU5fIexBYeKc7BPy3WzbleVAC4qVEIqhCt2zxUaKTQYdf0fEG_ZPOdvAJiqyIW6Y48vdA450VAMh-FII_XhNCZXdDTsf2gfiv7gQ5cf2E2kLof5xWfs6-1103yU7ef7snluy8SNPpXGk92BlsEqBVEQxEkQdxpc7T34CcNEqxU547yuySJaU090DiP3EOSMLf52UwhhexxTT-N5e_klfwFz0UDr</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Bayesian nonparametric language models</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Ying-Lan Chang ; Jen-Tzung Chien</creator><creatorcontrib>Ying-Lan Chang ; Jen-Tzung Chien</creatorcontrib><description>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</description><identifier>ISBN: 1467325066</identifier><identifier>ISBN: 9781467325066</identifier><identifier>EISBN: 1467325074</identifier><identifier>EISBN: 1467325058</identifier><identifier>EISBN: 9781467325059</identifier><identifier>EISBN: 9781467325073</identifier><identifier>DOI: 10.1109/ISCSLP.2012.6423460</identifier><language>eng</language><publisher>IEEE</publisher><subject>backoff smoothing ; Bayesian methods ; Bayesian nonparametrics ; Computational modeling ; Context ; Data models ; language model ; Smoothing methods ; Speech ; Speech recognition ; topic model</subject><ispartof>2012 8th International Symposium on Chinese Spoken Language Processing, 2012, p.188-192</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6423460$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6423460$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><title>Bayesian nonparametric language models</title><title>2012 8th International Symposium on Chinese Spoken Language Processing</title><addtitle>ISCSLP</addtitle><description>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</description><subject>backoff smoothing</subject><subject>Bayesian methods</subject><subject>Bayesian nonparametrics</subject><subject>Computational modeling</subject><subject>Context</subject><subject>Data models</subject><subject>language model</subject><subject>Smoothing methods</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>topic model</subject><isbn>1467325066</isbn><isbn>9781467325066</isbn><isbn>1467325074</isbn><isbn>1467325058</isbn><isbn>9781467325059</isbn><isbn>9781467325073</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFj0tLw0AUhUdEUGt_QTdZuUu887qTWWrwUQhYaF2X23mUkSQtmbrovzdgwc05nMU5fIexBYeKc7BPy3WzbleVAC4qVEIqhCt2zxUaKTQYdf0fEG_ZPOdvAJiqyIW6Y48vdA450VAMh-FII_XhNCZXdDTsf2gfiv7gQ5cf2E2kLof5xWfs6-1103yU7ef7snluy8SNPpXGk92BlsEqBVEQxEkQdxpc7T34CcNEqxU547yuySJaU090DiP3EOSMLf52UwhhexxTT-N5e_klfwFz0UDr</recordid><startdate>201212</startdate><enddate>201212</enddate><creator>Ying-Lan Chang</creator><creator>Jen-Tzung Chien</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201212</creationdate><title>Bayesian nonparametric language models</title><author>Ying-Lan Chang ; Jen-Tzung Chien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-7da9b053e9440f2a0ff2a66b50c8dd0d7327f954ac7cd58a966978673c6f1d0e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>backoff smoothing</topic><topic>Bayesian methods</topic><topic>Bayesian nonparametrics</topic><topic>Computational modeling</topic><topic>Context</topic><topic>Data models</topic><topic>language model</topic><topic>Smoothing methods</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>topic model</topic><toplevel>online_resources</toplevel><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ying-Lan Chang</au><au>Jen-Tzung Chien</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Bayesian nonparametric language models</atitle><btitle>2012 8th International Symposium on Chinese Spoken Language Processing</btitle><stitle>ISCSLP</stitle><date>2012-12</date><risdate>2012</risdate><spage>188</spage><epage>192</epage><pages>188-192</pages><isbn>1467325066</isbn><isbn>9781467325066</isbn><eisbn>1467325074</eisbn><eisbn>1467325058</eisbn><eisbn>9781467325059</eisbn><eisbn>9781467325073</eisbn><abstract>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</abstract><pub>IEEE</pub><doi>10.1109/ISCSLP.2012.6423460</doi><tpages>5</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 1467325066
ispartof	2012 8th International Symposium on Chinese Spoken Language Processing, 2012, p.188-192
issn
language	eng
recordid	cdi_ieee_primary_6423460
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	backoff smoothing Bayesian methods Bayesian nonparametrics Computational modeling Context Data models language model Smoothing methods Speech Speech recognition topic model
title	Bayesian nonparametric language models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T23%3A43%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Bayesian%20nonparametric%20language%20models&rft.btitle=2012%208th%20International%20Symposium%20on%20Chinese%20Spoken%20Language%20Processing&rft.au=Ying-Lan%20Chang&rft.date=2012-12&rft.spage=188&rft.epage=192&rft.pages=188-192&rft.isbn=1467325066&rft.isbn_list=9781467325066&rft_id=info:doi/10.1109/ISCSLP.2012.6423460&rft_dat=%3Cieee_6IE%3E6423460%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1467325074&rft.eisbn_list=1467325058&rft.eisbn_list=9781467325059&rft.eisbn_list=9781467325073&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6423460&rfr_iscdi=true