Comparative study of several distortion measures for speech recognition

Local spectral distortion measures are commonly used to measure the similarity (or spectral distance) between two given short-time spectra. In this study we compared several different spectral distortion measures including the Itakura-Saito distortion measure, the log likelihood ratio (LLR) distorti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Speech communication 1985-12, Vol.4 (4), p.317-331
Hauptverfasser: Nocerino, N., Soong, F.K., Rabiner, L.R., Klatt, D.H.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Local spectral distortion measures are commonly used to measure the similarity (or spectral distance) between two given short-time spectra. In this study we compared several different spectral distortion measures including the Itakura-Saito distortion measure, the log likelihood ratio (LLR) distortion measure, the likelihood ratio (LR) distortion measure, the cepstral (CEP) distortion measure, and two proposed perceptually based distortion measures, the weighted likelihood ratio (WLR) and the weighted slope metric (WSM) distortion measures, in terms of their effects on the performance of standard dynamic time warping (DTW) based, isolated word, speech recognizer. Two modifications of the basic forms of each measure were also investigated, namely a Bark-scale frequency warping and the incorporation of suprasegemental energy information. All distortion measures and their modifications were tested on an alpha-digit vocabulary, 4-talker, telephone recording data base. The results can be summarized as: (1) All LPC-based distortion measures performed reasonably well. The log likelihood ratio and weighted slope metric distortion measures gave the highest recognition accuracy, while the Itakura-Saito distortion measure gave the lowest score; (2) Whereas the addition of suprasegmental energy information helped the recognition performance, the use of gain and absolute loudness degraded the performance; (3) Bark-scale frequency warping did not, at least for the highly bandlimited telephone data base we tested, performed as well as its unwarped counterpart; (4) The weighted likelihood ratio distortion measure did not perform as well as its unweighted counterpart. Ein lokales Mass der spektralen Verzerrung wird oft angewandt, um die Ähnlichkeit (oder Distanz) zwischen zwei Kurzzeitspektren zu bestimmen. In dieser Studie vergleichen wir verschiedene spektrale Verzerrungsmasse, nämlich das Itakura-Saito Verzerrungsmass (IS), den Logarithmus des Wahrscheinlichkeitsquotienten (LR), des Cepstrale Verzerrungsmass (CEP) sowie zwei Verzerrungsmasse mit perzeptivem Hintergrund—den gewichteten Wahrscheinlichkeitsquotienten (WLR) und die Metrik mit gewichtetem Richtungskoeffizienten (WSM). Unser Ziel war, diese Verzerrungsmasse auf ihren Einfluss auf die Leistung eines Einzelworterkennungssystems zu untersuchen, welches auf einer dynamischen Verzerrungsmethode beruht. Zwei Modifikationen jedes Masses wurden ebenfalls untersucht, nämlich eine Frequenzverzerrung entlang einer Barks
ISSN:0167-6393
1872-7182
DOI:10.1016/0167-6393(85)90057-3