Probabilistic topic models for sequence data

Probabilistic topic models are widely used in different contexts to uncover the hidden structure in large text corpora. One of the main (and perhaps strong) assumption of these models is that generative process follows a bag-of-words assumption, i.e. each token is independent from the previous one....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine learning 2013-10, Vol.93 (1), p.5-29
Hauptverfasser:	Barbieri, Nicola, Manco, Giuseppe, Ritacco, Ettore, Carnuccio, Marco, Bevacqua, Antonio
Format:	Artikel
Sprache:	eng
Schlagworte:	Allocations Applied sciences Artificial Intelligence Collaboration Computer Science Computer science control theory systems Computer systems and distributed systems. User interface Control Data processing. List processing. Character string processing Dirichlet problem Exact sciences and technology Markov models Mechatronics Memory organisation. Data processing Natural Language Processing (NLP) Probabilistic methods Probability Probability theory Recall Robotics Sampling Simulation and Modeling Software Speech and sound recognition and synthesis. Linguistics Texts
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Probabilistic topic models are widely used in different contexts to uncover the hidden structure in large text corpora. One of the main (and perhaps strong) assumption of these models is that generative process follows a bag-of-words assumption, i.e. each token is independent from the previous one. We extend the popular Latent Dirichlet Allocation model by exploiting three different conditional Markovian assumptions: (i) the token generation depends on the current topic and on the previous token; (ii) the topic associated with each observation depends on topic associated with the previous one; (iii) the token generation depends on the current and previous topic. For each of these modeling assumptions we present a Gibbs Sampling procedure for parameter estimation. Experimental evaluation over real-word data shows the performance advantages, in terms of recall and precision, of the sequence-modeling approaches.
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-013-5391-2