Bayesian nonparametric modeling of hierarchical topics and sentences

Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ying-Lan Chang, Jui-Jung Hung, Jen-Tzung Chien
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 6
container_issue
container_start_page 1
container_title
container_volume
creator Ying-Lan Chang
Jui-Jung Hung
Jen-Tzung Chien
description Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.
doi_str_mv 10.1109/MLSP.2011.6064569
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6064569</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6064569</ieee_id><sourcerecordid>6064569</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-40238113a283e742c71cdb59f88bac06c29ba1068485f59ac9253398f105718d3</originalsourceid><addsrcrecordid>eNo1kLtOwzAUQM1LIi39AMTiH0jw9dsjlKcUBBIgsVU3jkONEqeKs_TvGSjTGY50hkPIJbAKgLnrl_r9reIMoNJMS6XdEVk5Y0EqY0BzwY9JwYWxpeP264Qs_gXoU1KAUlByJeGcLHL-YUxyAVCQu1vchxwx0TSmHU44hHmKng5jG_qYvunY0W0ME05-Gz32dB530WeKqaU5pDkkH_IFOeuwz2F14JJ8Ptx_rJ_K-vXxeX1TlxGMmkvJuLAAArkVwUjuDfi2Ua6ztkHPtOeuQWDaSqs65dA7roRwtgOmDNhWLMnVXzeGEDa7KQ447TeHG-IXC8pOkA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</creator><creatorcontrib>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</creatorcontrib><description>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</description><identifier>ISSN: 1551-2541</identifier><identifier>ISBN: 1457716216</identifier><identifier>ISBN: 9781457716218</identifier><identifier>EISSN: 2378-928X</identifier><identifier>EISBN: 9781457716232</identifier><identifier>EISBN: 1457716232</identifier><identifier>EISBN: 1457716224</identifier><identifier>EISBN: 9781457716225</identifier><identifier>DOI: 10.1109/MLSP.2011.6064569</identifier><language>eng</language><publisher>IEEE</publisher><subject>Approximation algorithms ; Bayesian methods ; Bayesian nonparametrics ; Data models ; document summarization ; Graphical models ; Resource management ; Topic model ; Unsupervised learning ; Vocabulary</subject><ispartof>2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6064569$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54899</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6064569$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jui-Jung Hung</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><title>2011 IEEE International Workshop on Machine Learning for Signal Processing</title><addtitle>MLSP</addtitle><description>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</description><subject>Approximation algorithms</subject><subject>Bayesian methods</subject><subject>Bayesian nonparametrics</subject><subject>Data models</subject><subject>document summarization</subject><subject>Graphical models</subject><subject>Resource management</subject><subject>Topic model</subject><subject>Unsupervised learning</subject><subject>Vocabulary</subject><issn>1551-2541</issn><issn>2378-928X</issn><isbn>1457716216</isbn><isbn>9781457716218</isbn><isbn>9781457716232</isbn><isbn>1457716232</isbn><isbn>1457716224</isbn><isbn>9781457716225</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1kLtOwzAUQM1LIi39AMTiH0jw9dsjlKcUBBIgsVU3jkONEqeKs_TvGSjTGY50hkPIJbAKgLnrl_r9reIMoNJMS6XdEVk5Y0EqY0BzwY9JwYWxpeP264Qs_gXoU1KAUlByJeGcLHL-YUxyAVCQu1vchxwx0TSmHU44hHmKng5jG_qYvunY0W0ME05-Gz32dB530WeKqaU5pDkkH_IFOeuwz2F14JJ8Ptx_rJ_K-vXxeX1TlxGMmkvJuLAAArkVwUjuDfi2Ua6ztkHPtOeuQWDaSqs65dA7roRwtgOmDNhWLMnVXzeGEDa7KQ447TeHG-IXC8pOkA</recordid><startdate>201109</startdate><enddate>201109</enddate><creator>Ying-Lan Chang</creator><creator>Jui-Jung Hung</creator><creator>Jen-Tzung Chien</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201109</creationdate><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><author>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-40238113a283e742c71cdb59f88bac06c29ba1068485f59ac9253398f105718d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Approximation algorithms</topic><topic>Bayesian methods</topic><topic>Bayesian nonparametrics</topic><topic>Data models</topic><topic>document summarization</topic><topic>Graphical models</topic><topic>Resource management</topic><topic>Topic model</topic><topic>Unsupervised learning</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jui-Jung Hung</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ying-Lan Chang</au><au>Jui-Jung Hung</au><au>Jen-Tzung Chien</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Bayesian nonparametric modeling of hierarchical topics and sentences</atitle><btitle>2011 IEEE International Workshop on Machine Learning for Signal Processing</btitle><stitle>MLSP</stitle><date>2011-09</date><risdate>2011</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><issn>1551-2541</issn><eissn>2378-928X</eissn><isbn>1457716216</isbn><isbn>9781457716218</isbn><eisbn>9781457716232</eisbn><eisbn>1457716232</eisbn><eisbn>1457716224</eisbn><eisbn>9781457716225</eisbn><abstract>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</abstract><pub>IEEE</pub><doi>10.1109/MLSP.2011.6064569</doi><tpages>6</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1551-2541
ispartof 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011, p.1-6
issn 1551-2541
2378-928X
language eng
recordid cdi_ieee_primary_6064569
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Approximation algorithms
Bayesian methods
Bayesian nonparametrics
Data models
document summarization
Graphical models
Resource management
Topic model
Unsupervised learning
Vocabulary
title Bayesian nonparametric modeling of hierarchical topics and sentences
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T05%3A42%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Bayesian%20nonparametric%20modeling%20of%20hierarchical%20topics%20and%20sentences&rft.btitle=2011%20IEEE%20International%20Workshop%20on%20Machine%20Learning%20for%20Signal%20Processing&rft.au=Ying-Lan%20Chang&rft.date=2011-09&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.issn=1551-2541&rft.eissn=2378-928X&rft.isbn=1457716216&rft.isbn_list=9781457716218&rft_id=info:doi/10.1109/MLSP.2011.6064569&rft_dat=%3Cieee_6IE%3E6064569%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781457716232&rft.eisbn_list=1457716232&rft.eisbn_list=1457716224&rft.eisbn_list=9781457716225&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6064569&rfr_iscdi=true