Bayesian nonparametric modeling of hierarchical topics and sentences
Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 6 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | |
container_volume | |
creator | Ying-Lan Chang Jui-Jung Hung Jen-Tzung Chien |
description | Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures. |
doi_str_mv | 10.1109/MLSP.2011.6064569 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6064569</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6064569</ieee_id><sourcerecordid>6064569</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-40238113a283e742c71cdb59f88bac06c29ba1068485f59ac9253398f105718d3</originalsourceid><addsrcrecordid>eNo1kLtOwzAUQM1LIi39AMTiH0jw9dsjlKcUBBIgsVU3jkONEqeKs_TvGSjTGY50hkPIJbAKgLnrl_r9reIMoNJMS6XdEVk5Y0EqY0BzwY9JwYWxpeP264Qs_gXoU1KAUlByJeGcLHL-YUxyAVCQu1vchxwx0TSmHU44hHmKng5jG_qYvunY0W0ME05-Gz32dB530WeKqaU5pDkkH_IFOeuwz2F14JJ8Ptx_rJ_K-vXxeX1TlxGMmkvJuLAAArkVwUjuDfi2Ua6ztkHPtOeuQWDaSqs65dA7roRwtgOmDNhWLMnVXzeGEDa7KQ447TeHG-IXC8pOkA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</creator><creatorcontrib>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</creatorcontrib><description>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</description><identifier>ISSN: 1551-2541</identifier><identifier>ISBN: 1457716216</identifier><identifier>ISBN: 9781457716218</identifier><identifier>EISSN: 2378-928X</identifier><identifier>EISBN: 9781457716232</identifier><identifier>EISBN: 1457716232</identifier><identifier>EISBN: 1457716224</identifier><identifier>EISBN: 9781457716225</identifier><identifier>DOI: 10.1109/MLSP.2011.6064569</identifier><language>eng</language><publisher>IEEE</publisher><subject>Approximation algorithms ; Bayesian methods ; Bayesian nonparametrics ; Data models ; document summarization ; Graphical models ; Resource management ; Topic model ; Unsupervised learning ; Vocabulary</subject><ispartof>2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011, p.1-6</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6064569$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54899</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6064569$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jui-Jung Hung</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><title>2011 IEEE International Workshop on Machine Learning for Signal Processing</title><addtitle>MLSP</addtitle><description>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</description><subject>Approximation algorithms</subject><subject>Bayesian methods</subject><subject>Bayesian nonparametrics</subject><subject>Data models</subject><subject>document summarization</subject><subject>Graphical models</subject><subject>Resource management</subject><subject>Topic model</subject><subject>Unsupervised learning</subject><subject>Vocabulary</subject><issn>1551-2541</issn><issn>2378-928X</issn><isbn>1457716216</isbn><isbn>9781457716218</isbn><isbn>9781457716232</isbn><isbn>1457716232</isbn><isbn>1457716224</isbn><isbn>9781457716225</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1kLtOwzAUQM1LIi39AMTiH0jw9dsjlKcUBBIgsVU3jkONEqeKs_TvGSjTGY50hkPIJbAKgLnrl_r9reIMoNJMS6XdEVk5Y0EqY0BzwY9JwYWxpeP264Qs_gXoU1KAUlByJeGcLHL-YUxyAVCQu1vchxwx0TSmHU44hHmKng5jG_qYvunY0W0ME05-Gz32dB530WeKqaU5pDkkH_IFOeuwz2F14JJ8Ptx_rJ_K-vXxeX1TlxGMmkvJuLAAArkVwUjuDfi2Ua6ztkHPtOeuQWDaSqs65dA7roRwtgOmDNhWLMnVXzeGEDa7KQ447TeHG-IXC8pOkA</recordid><startdate>201109</startdate><enddate>201109</enddate><creator>Ying-Lan Chang</creator><creator>Jui-Jung Hung</creator><creator>Jen-Tzung Chien</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201109</creationdate><title>Bayesian nonparametric modeling of hierarchical topics and sentences</title><author>Ying-Lan Chang ; Jui-Jung Hung ; Jen-Tzung Chien</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-40238113a283e742c71cdb59f88bac06c29ba1068485f59ac9253398f105718d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Approximation algorithms</topic><topic>Bayesian methods</topic><topic>Bayesian nonparametrics</topic><topic>Data models</topic><topic>document summarization</topic><topic>Graphical models</topic><topic>Resource management</topic><topic>Topic model</topic><topic>Unsupervised learning</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Ying-Lan Chang</creatorcontrib><creatorcontrib>Jui-Jung Hung</creatorcontrib><creatorcontrib>Jen-Tzung Chien</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ying-Lan Chang</au><au>Jui-Jung Hung</au><au>Jen-Tzung Chien</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Bayesian nonparametric modeling of hierarchical topics and sentences</atitle><btitle>2011 IEEE International Workshop on Machine Learning for Signal Processing</btitle><stitle>MLSP</stitle><date>2011-09</date><risdate>2011</risdate><spage>1</spage><epage>6</epage><pages>1-6</pages><issn>1551-2541</issn><eissn>2378-928X</eissn><isbn>1457716216</isbn><isbn>9781457716218</isbn><eisbn>9781457716232</eisbn><eisbn>1457716232</eisbn><eisbn>1457716224</eisbn><eisbn>9781457716225</eisbn><abstract>Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.</abstract><pub>IEEE</pub><doi>10.1109/MLSP.2011.6064569</doi><tpages>6</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1551-2541 |
ispartof | 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011, p.1-6 |
issn | 1551-2541 2378-928X |
language | eng |
recordid | cdi_ieee_primary_6064569 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Approximation algorithms Bayesian methods Bayesian nonparametrics Data models document summarization Graphical models Resource management Topic model Unsupervised learning Vocabulary |
title | Bayesian nonparametric modeling of hierarchical topics and sentences |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T05%3A42%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Bayesian%20nonparametric%20modeling%20of%20hierarchical%20topics%20and%20sentences&rft.btitle=2011%20IEEE%20International%20Workshop%20on%20Machine%20Learning%20for%20Signal%20Processing&rft.au=Ying-Lan%20Chang&rft.date=2011-09&rft.spage=1&rft.epage=6&rft.pages=1-6&rft.issn=1551-2541&rft.eissn=2378-928X&rft.isbn=1457716216&rft.isbn_list=9781457716218&rft_id=info:doi/10.1109/MLSP.2011.6064569&rft_dat=%3Cieee_6IE%3E6064569%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781457716232&rft.eisbn_list=1457716232&rft.eisbn_list=1457716224&rft.eisbn_list=9781457716225&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6064569&rfr_iscdi=true |