CHUNKING AND OVERLAP DECODING STRATEGY FOR STREAMING RNN TRANSDUCERS FOR SPEECH RECOGNITION

A computer-implemented method is provided for improving accuracy recognition of digital speech. The method includes receiving the digital speech. The method further includes splitting the digital speech into overlapping chunks. The method also includes computing a bidirectional encoder embedding of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	SAON, George Andrei
Format:	Patent
Sprache:	eng ; fre
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A computer-implemented method is provided for improving accuracy recognition of digital speech. The method includes receiving the digital speech. The method further includes splitting the digital speech into overlapping chunks. The method also includes computing a bidirectional encoder embedding of each of the overlapping chunks to obtain bidirectional encoder embeddings. The method additionally includes combining the bidirectional encoder embeddings. The method further includes interpreting, by a speech recognition system, the digital speech using the combined bidirectional encoder embeddings. L'invention concerne un procédé mis en œuvre par ordinateur, destiné à améliorer la précision de reconnaissance de la parole numérique. Le procédé selon l'invention consiste à recevoir la parole numérique. Le procédé consiste également à diviser la parole numérique en segments chevauchants. Le procédé consiste encore à calculer un emboîtement de codeur bidirectionnel de chacun des segments chevauchants pour obtenir des emboîtements de codeur bidirectionnel. Le procédé consiste en outre à combiner les emboîtements de codeur bidirectionnel. Le procédé consiste également à interpréter la parole numérique, par un système de reconnaissance vocale, au moyen des emboîtements de codeur bidirectionnel combinés.