Adaptively learning probabilistic deterministic automata from data streams

Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Dete...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine learning 2014-07, Vol.96 (1-2), p.99-127
Hauptverfasser:	Balle, Borja, Castro, Jorge, Gavaldà, Ricard
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Computer Science Constants Control Data streams Data transmission Formalism Informàtica Informàtica teòrica Learning Machine theory Markov analysis Markov processes Markov, Processos de Mechatronics Màquines, Teoria de Natural Language Processing (NLP) PAC learning PDFA Probabilistic automata Probabilistic methods Probability Probability theory Robotics Simulation and Modeling Stream sketches Streams Àrees temàtiques de la UPC
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for inferring models in this class in the restrictive data stream scenario: Unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We also present extensions of the algorithm that (1) reduce to a minimum the need for guessing parameters of the target distribution and (2) are able to adapt to changes in the input distribution, relearning new models when needed. We provide rigorous PAC-like bounds for all of the above. Our algorithm makes a key usage of stream sketching techniques for reducing memory and processing time, and is modular in that it can use different tests for state equivalence and for change detection in the stream.
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-013-5408-x