ACOUSTIC MODEL CONDITIONING ON SOUND FEATURES

The invention relates to an acoustic model conditioning on sound features. Systems and methods of speech recognition capture segments of speech audio having a key phrase shortly followed by an utterance. An encoder uses the key phrase segment to compute a sound embedding, which is stored. An acousti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	MOHAJER KEYVAN, GOWAYYED ZIZU
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention relates to an acoustic model conditioning on sound features. Systems and methods of speech recognition capture segments of speech audio having a key phrase shortly followed by an utterance. An encoder uses the key phrase segment to compute a sound embedding, which is stored. An acoustic model for speech recognition infers phonemes from the utterance audio signal using a model that is conditioned on the sound embedding as an input. The sound embedding may be held until another key phrase is captured or a session ends. The acoustic model and encoder may be jointly trained from speech data recordings that may be mixed with noise, the profile of mixed noise being the same for the key phrase segment and the utterance segment. 本公开涉及以声音特征为条件的声学模型。话音识别的系统和方法捕捉具有关键短语及其后紧接的话语的话音音频的片段。编码器使用关键短语片段来计算声音嵌入，该声音嵌入被存储。用于话音识别的声学模型利用以声音嵌入作为输入条件的模型来从话语音频信号推断音素。声音嵌入可被保持，直到另一关键短语被捕捉或者会话结束为止。可以从与噪声混合的话音数据记录来联合训练声学模型和编码器，混合噪声的剖面对于关键短语片段和话语片段是相同的。