Structure Learning of Continuous Speech based Unsupervised Segmentation

Humans can divide perceived continuous speech signals into phonemes and words, which have a double articulation structure, without explicit boundary points and labels, and learn the language. Learning such a double articulation structure of speech signals is important for realizing a robot that can...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the Robotics Society of Japan 2023, Vol.41(3), pp.318-321
Hauptverfasser: Nagano, Masatoshi, Nakamura, Tomoaki
Format: Artikel
Sprache:eng ; jpn
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Humans can divide perceived continuous speech signals into phonemes and words, which have a double articulation structure, without explicit boundary points and labels, and learn the language. Learning such a double articulation structure of speech signals is important for realizing a robot that can acquire vocabulary and have a conversation. In this paper, we propose a novel statistical model GP-HSMM-DAA (Gaussian Process Hidden Semi Markov Model-based Double Articulation Analyzer) that can learn double articulation structures of time-series data by connecting statistical models hierarchically. In the proposed model, the parameters of each statistical model are mutually updated and learned complementarily. We present that GP-HSMM-DAA can segment continuous speech into phonemes and words with higher accuracy than the baseline methods.
ISSN:0289-1824
1884-7145
DOI:10.7210/jrsj.41.318