Bayesian nonparametric language models

Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ying-Lan Chang, Jen-Tzung Chien
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 192
container_issue
container_start_page 188
container_title
container_volume
creator Ying-Lan Chang
Jen-Tzung Chien
description Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.
doi_str_mv 10.1109/ISCSLP.2012.6423460
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6423460</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6423460</ieee_id><sourcerecordid>6423460</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-7da9b053e9440f2a0ff2a66b50c8dd0d7327f954ac7cd58a966978673c6f1d0e3</originalsourceid><addsrcrecordid>eNpFj0tLw0AUhUdEUGt_QTdZuUu887qTWWrwUQhYaF2X23mUkSQtmbrovzdgwc05nMU5fIexBYeKc7BPy3WzbleVAC4qVEIqhCt2zxUaKTQYdf0fEG_ZPOdvAJiqyIW6Y48vdA450VAMh-FII_XhNCZXdDTsf2gfiv7gQ5cf2E2kLof5xWfs6-1103yU7ef7snluy8SNPpXGk92BlsEqBVEQxEkQdxpc7T34CcNEqxU547yuySJaU090DiP3EOSMLf52UwhhexxTT-N5e_klfwFz0UDr</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Bayesian nonparametric language models</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Ying-Lan Chang ; Jen-Tzung Chien</creator><creatorcontrib>Ying-Lan Chang ; Jen-Tzung Chien</creatorcontrib><description>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</description><identifier>ISBN: 1467325066</identifier><identifier>ISBN: 9781467325066</identifier><identifier>EISBN: 1467325074</identifier><identifier>EISBN: 1467325058</identifier><identifier>EISBN: 9781467325059</identifier><identifier>EISBN: 9781467325073</identifier><identifier>DOI: 10.1109/ISCSLP.2012.6423460</identifier><language>eng</language><publisher>IEEE</publisher><subject>backoff smoothing ; Bayesian methods ; Bayesian nonparametrics ; Computational modeling ; Context ; Data models ; language model ; Smoothing methods ; Speech ; Speech recognition ; topic model</subject><ispartof>2012 8th International Symposium on Chinese Spoken Language Processing, 2012, p.188-192</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6423460$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6423460$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><title>Bayesian nonparametric language models</title><title>2012 8th International Symposium on Chinese Spoken Language Processing</title><addtitle>ISCSLP</addtitle><description>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</description><subject>backoff smoothing</subject><subject>Bayesian methods</subject><subject>Bayesian nonparametrics</subject><subject>Computational modeling</subject><subject>Context</subject><subject>Data models</subject><subject>language model</subject><subject>Smoothing methods</subject><subject>Speech</subject><subject>Speech recognition</subject><subject>topic model</subject><isbn>1467325066</isbn><isbn>9781467325066</isbn><isbn>1467325074</isbn><isbn>1467325058</isbn><isbn>9781467325059</isbn><isbn>9781467325073</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFj0tLw0AUhUdEUGt_QTdZuUu887qTWWrwUQhYaF2X23mUkSQtmbrovzdgwc05nMU5fIexBYeKc7BPy3WzbleVAC4qVEIqhCt2zxUaKTQYdf0fEG_ZPOdvAJiqyIW6Y48vdA450VAMh-FII_XhNCZXdDTsf2gfiv7gQ5cf2E2kLof5xWfs6-1103yU7ef7snluy8SNPpXGk92BlsEqBVEQxEkQdxpc7T34CcNEqxU547yuySJaU090DiP3EOSMLf52UwhhexxTT-N5e_klfwFz0UDr</recordid><startdate>201212</startdate><enddate>201212</enddate><creator>Ying-Lan Chang</creator><creator>Jen-Tzung Chien</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201212</creationdate><title>Bayesian nonparametric language models</title><author>Ying-Lan Chang ; Jen-Tzung Chien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-7da9b053e9440f2a0ff2a66b50c8dd0d7327f954ac7cd58a966978673c6f1d0e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>backoff smoothing</topic><topic>Bayesian methods</topic><topic>Bayesian nonparametrics</topic><topic>Computational modeling</topic><topic>Context</topic><topic>Data models</topic><topic>language model</topic><topic>Smoothing methods</topic><topic>Speech</topic><topic>Speech recognition</topic><topic>topic model</topic><toplevel>online_resources</toplevel><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ying-Lan Chang</au><au>Jen-Tzung Chien</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Bayesian nonparametric language models</atitle><btitle>2012 8th International Symposium on Chinese Spoken Language Processing</btitle><stitle>ISCSLP</stitle><date>2012-12</date><risdate>2012</risdate><spage>188</spage><epage>192</epage><pages>188-192</pages><isbn>1467325066</isbn><isbn>9781467325066</isbn><eisbn>1467325074</eisbn><eisbn>1467325058</eisbn><eisbn>9781467325059</eisbn><eisbn>9781467325073</eisbn><abstract>Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.</abstract><pub>IEEE</pub><doi>10.1109/ISCSLP.2012.6423460</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 1467325066
ispartof 2012 8th International Symposium on Chinese Spoken Language Processing, 2012, p.188-192
issn
language eng
recordid cdi_ieee_primary_6423460
source IEEE Electronic Library (IEL) Conference Proceedings
subjects backoff smoothing
Bayesian methods
Bayesian nonparametrics
Computational modeling
Context
Data models
language model
Smoothing methods
Speech
Speech recognition
topic model
title Bayesian nonparametric language models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T23%3A43%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Bayesian%20nonparametric%20language%20models&rft.btitle=2012%208th%20International%20Symposium%20on%20Chinese%20Spoken%20Language%20Processing&rft.au=Ying-Lan%20Chang&rft.date=2012-12&rft.spage=188&rft.epage=192&rft.pages=188-192&rft.isbn=1467325066&rft.isbn_list=9781467325066&rft_id=info:doi/10.1109/ISCSLP.2012.6423460&rft_dat=%3Cieee_6IE%3E6423460%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1467325074&rft.eisbn_list=1467325058&rft.eisbn_list=9781467325059&rft.eisbn_list=9781467325073&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6423460&rfr_iscdi=true