Training for long form speech recognition

A method (700) includes obtaining training samples (400), each training sample including a corresponding sequence of speech segments (405) corresponding to a training utterance and a corresponding sequence of real transcriptions (415) of the sequence of speech segments, and each real transcriptions...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: STROMAN TREVOR, LU ZHIYUN, PRABHAKAR, ROHIT, DUTT, THIBAUT, PAN YANWEI, ZHANG CHAO, CAO LIANGLIANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method (700) includes obtaining training samples (400), each training sample including a corresponding sequence of speech segments (405) corresponding to a training utterance and a corresponding sequence of real transcriptions (415) of the sequence of speech segments, and each real transcriptions including a start time (414) and an end time (416) of the corresponding speech segment. For each of the training samples, the method comprises: processing a corresponding sequence of speech segments using a speech recognition model (200) to obtain one or more speech recognition hypotheses (522) for a training utterance; and, for each speech recognition hypothesis obtained for the training utterance, identifying a respective number of word errors relative to the corresponding real transcription sequence. The method trains the speech recognition model to minimize the word error rate based on a respective number of word errors identified for each speech recognition hypothesis obtained for the training utterance. 一种方法(