Bayesian nonparametric modeling of hierarchical topics and sentences

Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ying-Lan Chang, Jui-Jung Hung, Jen-Tzung Chien
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Approximation algorithms Bayesian methods Bayesian nonparametrics Data models document summarization Graphical models Resource management Topic model Unsupervised learning Vocabulary
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	6
container_issue
container_start_page	1
container_title
container_volume
creator	Ying-Lan Chang Jui-Jung Hung Jen-Tzung Chien
description	Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.
doi_str_mv	10.1109/MLSP.2011.6064569
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6064569</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6064569</ieee_id><sourcerecordid>6064569</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-40238113a283e742c71cdb59f88bac06c29ba1068485f59ac9253398f105718d3</originalsourceid><addsrcrecordid>eNo1kLtOwzAUQM1LIi39AMTiH0jw9dsjlKcUBBIgsVU3jkONEqeKs_TvGSjTGY50hkPIJbAKgLnrl_r9reIMoNJMS6XdEVk5Y0EqY0BzwY9JwYWxpeP264Qs_gXoU1KAUlByJeGcLHL-YUxyAVCQu1vchxwx0TSmHU44hHmKng5jG_qYvunY0W0ME05-Gz32dB530WeKqaU5pDkkH_IFOeuwz2F14JJ8Ptx_rJ_K-vXxeX1TlxGMmkvJuLAAArkVwUjuDfi2Ua6ztkHPtOeuQWDaSqs65dA7roRwtgOmDNhWLMnVXzeGEDa7KQ447TeHG-IXC8pOkA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</creator><creatorcontrib>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</creatorcontrib><description>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</description><identifier>ISSN: 1551-2541</identifier><identifier>ISBN: 1457716216</identifier><identifier>ISBN: 9781457716218</identifier><identifier>EISSN: 2378-928X</identifier><identifier>EISBN: 9781457716232</identifier><identifier>EISBN: 1457716232</identifier><identifier>EISBN: 1457716224</identifier><identifier>EISBN: 9781457716225</identifier><identifier>DOI: 10.1109/MLSP.2011.6064569</identifier><language>eng</language><publisher>IEEE</publisher><subject>Approximation algorithms ; Bayesian methods ; Bayesian nonparametrics ; Data models ; document summarization ; Graphical models ; Resource management ; Topic model ; Unsupervised learning ; Vocabulary</subject><ispartof>2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6064569$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54899</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6064569$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jui-Jung Hung</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><title>2011 IEEE International Workshop on Machine Learning for Signal Processing</title><addtitle>MLSP</addtitle><description>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</description><subject>Approximation algorithms</subject><subject>Bayesian methods</subject><subject>Bayesian nonparametrics</subject><subject>Data models</subject><subject>document summarization</subject><subject>Graphical models</subject><subject>Resource management</subject><subject>Topic model</subject><subject>Unsupervised learning</subject><subject>Vocabulary</subject><issn>1551-2541</issn><issn>2378-928X</issn><isbn>1457716216</isbn><isbn>9781457716218</isbn><isbn>9781457716232</isbn><isbn>1457716232</isbn><isbn>1457716224</isbn><isbn>9781457716225</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1kLtOwzAUQM1LIi39AMTiH0jw9dsjlKcUBBIgsVU3jkONEqeKs_TvGSjTGY50hkPIJbAKgLnrl_r9reIMoNJMS6XdEVk5Y0EqY0BzwY9JwYWxpeP264Qs_gXoU1KAUlByJeGcLHL-YUxyAVCQu1vchxwx0TSmHU44hHmKng5jG_qYvunY0W0ME05-Gz32dB530WeKqaU5pDkkH_IFOeuwz2F14JJ8Ptx_rJ_K-vXxeX1TlxGMmkvJuLAAArkVwUjuDfi2Ua6ztkHPtOeuQWDaSqs65dA7roRwtgOmDNhWLMnVXzeGEDa7KQ447TeHG-IXC8pOkA</recordid><startdate>201109</startdate><enddate>201109</enddate><creator>Ying-Lan Chang</creator><creator>Jui-Jung Hung</creator><creator>Jen-Tzung Chien</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201109</creationdate><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><author>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-40238113a283e742c71cdb59f88bac06c29ba1068485f59ac9253398f105718d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Approximation algorithms</topic><topic>Bayesian methods</topic><topic>Bayesian nonparametrics</topic><topic>Data models</topic><topic>document summarization</topic><topic>Graphical models</topic><topic>Resource management</topic><topic>Topic model</topic><topic>Unsupervised learning</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jui-Jung Hung</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ying-Lan Chang</au><au>Jui-Jung Hung</au><au>Jen-Tzung Chien</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Bayesian nonparametric modeling of hierarchical topics and sentences</atitle><btitle>2011 IEEE International Workshop on Machine Learning for Signal Processing</btitle><stitle>MLSP</stitle><date>2011-09</date><risdate>2011</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><issn>1551-2541</issn><eissn>2378-928X</eissn><isbn>1457716216</isbn><isbn>9781457716218</isbn><eisbn>9781457716232</eisbn><eisbn>1457716232</eisbn><eisbn>1457716224</eisbn><eisbn>9781457716225</eisbn><abstract>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</abstract><pub>IEEE</pub><doi>10.1109/MLSP.2011.6064569</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1551-2541
ispartof	2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011, p.1-6
issn	1551-2541 2378-928X
language	eng
recordid	cdi_ieee_primary_6064569
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Approximation algorithms Bayesian methods Bayesian nonparametrics Data models document summarization Graphical models Resource management Topic model Unsupervised learning Vocabulary
title	Bayesian nonparametric modeling of hierarchical topics and sentences
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T05%3A42%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Bayesian%20nonparametric%20modeling%20of%20hierarchical%20topics%20and%20sentences&rft.btitle=2011%20IEEE%20International%20Workshop%20on%20Machine%20Learning%20for%20Signal%20Processing&rft.au=Ying-Lan%20Chang&rft.date=2011-09&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.issn=1551-2541&rft.eissn=2378-928X&rft.isbn=1457716216&rft.isbn_list=9781457716218&rft_id=info:doi/10.1109/MLSP.2011.6064569&rft_dat=%3Cieee_6IE%3E6064569%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781457716232&rft.eisbn_list=1457716232&rft.eisbn_list=1457716224&rft.eisbn_list=9781457716225&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6064569&rfr_iscdi=true