Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons

Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. The essential question here is what is the optimal segmentation. Th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yu Qiao, Shimomura, N., Minematsu, N.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Phoneme segmentation is a fundamental problem in many speech recognition and synthesis studies. Unsupervised phoneme segmentation assumes no knowledge on linguistic contents and acoustic models, and thus poses a challenging problem. The essential question here is what is the optimal segmentation. This paper formulates the optimal segmentation problem into a probabilistic framework. Using statistics and information theory analysis, we develop three different objective functions, namely, summation of square error (SSE), log determinant (LD) and rate distortion (RD). Specially, RD function is derived from information rate distortion theory and can be related to human signal perception mechanism. We introduce a time-constrained agglomerative clustering algorithm to find the optimal segmentations. We also propose an efficient method to implement the algorithm by using integration functions. We carry out experiments on TIMIT database to compare the above three objective functions. The results show that rate distortion achieves the best performance and indicate that our method outperforms the recently published unsupervised segmentation methods.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2008.4518528