Incorporating information from syllable-length time scales into automatic speech recognition

Including information distributed over intervals of syllabic duration (100-250 ms) may greatly improve the performance of automatic speech recognition (ASR) systems. ASR systems primarily use representations and recognition units covering phonetic durations (40-100 ms). Humans certainly use informat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Su-Lin Wu, Kingsbury, E.D., Morgan, N., Greenberg, S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Including information distributed over intervals of syllabic duration (100-250 ms) may greatly improve the performance of automatic speech recognition (ASR) systems. ASR systems primarily use representations and recognition units covering phonetic durations (40-100 ms). Humans certainly use information at phonetic time scales, but results from psychoacoustics and psycholinguistics highlight the crucial role of the syllable, and syllable-length intervals, in speech perception. We compare the performance of three ASR systems: a baseline system that uses phone-scale representations and units, an experimental system that uses a syllable-oriented front-end representation and syllabic units for recognition, and a third system that combines the phone-scale and syllable-scale recognizers by merging and rescoring N-best lists. Using the combined recognition system, we observed an improvement in word error rate for telephone-bandwidth, continuous numbers from 6.8% to 5.5% on a clean test set, and from 27.8% to 19.6% on a reverberant test set, over the baseline phone-based system.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.1998.675366