Bayesian nonparametric language models
Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cop...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 192 |
---|---|
container_issue | |
container_start_page | 188 |
container_title | |
container_volume | |
creator | Ying-Lan Chang Jen-Tzung Chien |
description | Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM. |
doi_str_mv | 10.1109/ISCSLP.2012.6423460 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6423460</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6423460</ieee_id><sourcerecordid>6423460</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-7da9b053e9440f2a0ff2a66b50c8dd0d7327f954ac7cd58a966978673c6f1d0e3</originalsourceid><addsrcrecordid>eNpFj0tLw0AUhUdEUGt_QTdZuUu887qTWWrwUQhYaF2X23mUkSQtmbrovzdgwc05nMU5fIexBYeKc7BPy3WzbleVAC4qVEIqhCt2zxUaKTQYdf0fEG_ZPOdvAJiqyIW6Y48vdA450VAMh-FII_XhNCZXdDTsf2gfiv7gQ5cf2E2kLof5xWfs6-1103yU7ef7snluy8SNPpXGk92BlsEqBVEQxEkQdxpc7T34CcNEqxU547yuySJaU090DiP3EOSMLf52UwhhexxTT-N5e_klfwFz0UDr</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Bayesian nonparametric language models</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Ying-Lan Chang ; Jen-Tzung Chien</creator><creatorcontrib>Ying-Lan Chang ; Jen-Tzung Chien</creatorcontrib><description>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</description><identifier>ISBN: 1467325066</identifier><identifier>ISBN: 9781467325066</identifier><identifier>EISBN: 1467325074</identifier><identifier>EISBN: 1467325058</identifier><identifier>EISBN: 9781467325059</identifier><identifier>EISBN: 9781467325073</identifier><identifier>DOI: 10.1109/ISCSLP.2012.6423460</identifier><language>eng</language><publisher>IEEE</publisher><subject>backoff smoothing ; Bayesian methods ; Bayesian nonparametrics ; Computational modeling ; Context ; Data models ; language model ; Smoothing methods ; Speech ; Speech recognition ; topic model</subject><ispartof>2012 8th International Symposium on Chinese Spoken Language Processing, 2012, p.188-192</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6423460$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6423460$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><title>Bayesian nonparametric language models</title><title>2012 8th International Symposium on Chinese Spoken Language Processing</title><addtitle>ISCSLP</addtitle><description>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</description><subject>backoff smoothing</subject><subject>Bayesian methods</subject><subject>Bayesian nonparametrics</subject><subject>Computational modeling</subject><subject>Context</subject><subject>Data models</subject><subject>language model</subject><subject>Smoothing methods</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>topic model</subject><isbn>1467325066</isbn><isbn>9781467325066</isbn><isbn>1467325074</isbn><isbn>1467325058</isbn><isbn>9781467325059</isbn><isbn>9781467325073</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFj0tLw0AUhUdEUGt_QTdZuUu887qTWWrwUQhYaF2X23mUkSQtmbrovzdgwc05nMU5fIexBYeKc7BPy3WzbleVAC4qVEIqhCt2zxUaKTQYdf0fEG_ZPOdvAJiqyIW6Y48vdA450VAMh-FII_XhNCZXdDTsf2gfiv7gQ5cf2E2kLof5xWfs6-1103yU7ef7snluy8SNPpXGk92BlsEqBVEQxEkQdxpc7T34CcNEqxU547yuySJaU090DiP3EOSMLf52UwhhexxTT-N5e_klfwFz0UDr</recordid><startdate>201212</startdate><enddate>201212</enddate><creator>Ying-Lan Chang</creator><creator>Jen-Tzung Chien</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201212</creationdate><title>Bayesian nonparametric language models</title><author>Ying-Lan Chang ; Jen-Tzung Chien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-7da9b053e9440f2a0ff2a66b50c8dd0d7327f954ac7cd58a966978673c6f1d0e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>backoff smoothing</topic><topic>Bayesian methods</topic><topic>Bayesian nonparametrics</topic><topic>Computational modeling</topic><topic>Context</topic><topic>Data models</topic><topic>language model</topic><topic>Smoothing methods</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>topic model</topic><toplevel>online_resources</toplevel><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ying-Lan Chang</au><au>Jen-Tzung Chien</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Bayesian nonparametric language models</atitle><btitle>2012 8th International Symposium on Chinese Spoken Language Processing</btitle><stitle>ISCSLP</stitle><date>2012-12</date><risdate>2012</risdate><spage>188</spage><epage>192</epage><pages>188-192</pages><isbn>1467325066</isbn><isbn>9781467325066</isbn><eisbn>1467325074</eisbn><eisbn>1467325058</eisbn><eisbn>9781467325059</eisbn><eisbn>9781467325073</eisbn><abstract>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</abstract><pub>IEEE</pub><doi>10.1109/ISCSLP.2012.6423460</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 1467325066 |
ispartof | 2012 8th International Symposium on Chinese Spoken Language Processing, 2012, p.188-192 |
issn | |
language | eng |
recordid | cdi_ieee_primary_6423460 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | backoff smoothing Bayesian methods Bayesian nonparametrics Computational modeling Context Data models language model Smoothing methods Speech Speech recognition topic model |
title | Bayesian nonparametric language models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T23%3A43%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Bayesian%20nonparametric%20language%20models&rft.btitle=2012%208th%20International%20Symposium%20on%20Chinese%20Spoken%20Language%20Processing&rft.au=Ying-Lan%20Chang&rft.date=2012-12&rft.spage=188&rft.epage=192&rft.pages=188-192&rft.isbn=1467325066&rft.isbn_list=9781467325066&rft_id=info:doi/10.1109/ISCSLP.2012.6423460&rft_dat=%3Cieee_6IE%3E6423460%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1467325074&rft.eisbn_list=1467325058&rft.eisbn_list=9781467325059&rft.eisbn_list=9781467325073&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6423460&rfr_iscdi=true |