Bayesian nonparametric modeling of hierarchical topics and sentences
Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures. |
---|---|
ISSN: | 1551-2541 2378-928X |
DOI: | 10.1109/MLSP.2011.6064569 |