Training for long form speech recognition

A method (700) includes obtaining training samples (400), each training sample including a corresponding sequence of speech segments (405) corresponding to a training utterance and a corresponding sequence of real transcriptions (415) of the sequence of speech segments, and each real transcriptions...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	STROMAN TREVOR, LU ZHIYUN, PRABHAKAR, ROHIT, DUTT, THIBAUT, PAN YANWEI, ZHANG CHAO, CAO LIANGLIANG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A method (700) includes obtaining training samples (400), each training sample including a corresponding sequence of speech segments (405) corresponding to a training utterance and a corresponding sequence of real transcriptions (415) of the sequence of speech segments, and each real transcriptions including a start time (414) and an end time (416) of the corresponding speech segment. For each of the training samples, the method comprises: processing a corresponding sequence of speech segments using a speech recognition model (200) to obtain one or more speech recognition hypotheses (522) for a training utterance; and, for each speech recognition hypothesis obtained for the training utterance, identifying a respective number of word errors relative to the corresponding real transcription sequence. The method trains the speech recognition model to minimize the word error rate based on a respective number of word errors identified for each speech recognition hypothesis obtained for the training utterance. 一种方法(