Statistical language modeling for speech disfluencies

Speech disfluencies (such as filled pauses, repetitions, restarts) are among the characteristics distinguishing spontaneous speech from planned or read speech. We introduce a language model that predicts disfluencies probabilistically and uses an edited, fluent context to predict following words. Th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Stolcke, A., Shriberg, E.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Context modeling Decoding Error analysis Laboratories Natural languages Performance analysis Predictive models Speech recognition Standards development
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Speech disfluencies (such as filled pauses, repetitions, restarts) are among the characteristics distinguishing spontaneous speech from planned or read speech. We introduce a language model that predicts disfluencies probabilistically and uses an edited, fluent context to predict following words. The model is based on a generalization of the standard N-gram language model. It uses dynamic programming to compute the probability of a word sequence, taking into account possible hidden disfluency events. We analyze the model's performance for various disfluency types on the Switchboard corpus. We find that the model reduces the word perplexity in the neighborhood of disfluency events; however, overall differences are small and have no significant impact on the recognition accuracy. We also note that for modeling of the most frequent type of disfluency, filled pauses, a segmentation of utterances into linguistic (rather than acoustic) units is required. Our analysis illustrates a generally useful technique for language model evaluation based on local perplexity comparisons.
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.1996.541118