Improved Features and Models for Detecting Edit Disfluencies in Transcribing Spontaneous Mandarin Speech

Detection of edit disfluencies is key to transcribing spontaneous utterances. In this paper, we present improved features and models to detect edit disfluencies and enhance transcription of spontaneous Mandarin speech using hypothesized disfluency interruption points (IPs) and edit word detection. A...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2009-09, Vol.17 (7), p.1263-1278
Hauptverfasser:	LIN, Che-Kuang, LEE, Lin-Shan
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Character recognition Decision trees Digital multimedia broadcasting Edit disfluency Entropy Exact sciences and technology Humans Information systems Information, signal and communications theory interruption point detection Mandarins Maximum entropy Natural language processing Natural languages Probability theory prosody Recognition Signal processing Speech Speech analysis Speech enhancement Speech processing Speech recognition Spontaneous spontaneous speech Telecommunications and information theory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Detection of edit disfluencies is key to transcribing spontaneous utterances. In this paper, we present improved features and models to detect edit disfluencies and enhance transcription of spontaneous Mandarin speech using hypothesized disfluency interruption points (IPs) and edit word detection. A comprehensive set of prosodic features that takes into account the special characteristics of edit disfluencies in Mandarin is developed, and an improved model combining decision trees and maximum entropy is proposed to detect IPs. This model is further adapted to desired prosodic conditions by latent prosodic modeling, a probabilistic framework for analyzing speech prosody in terms of a set of latent prosodic states. These techniques contribute to higher recognition accuracy (by rescoring with the hypothesized IPs) and better edit word detection (using conditional random fields defined on Chinese characters) in the final transcription, as verified by experiments on a spontaneous Mandarin speech corpus.
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2009.2014792