Acoustic model for performing section-by-section learning based on alignment information and speech recognition apparatus including the same

In accordance with one embodiment, an acoustic model performing section-by-section training based on alignment information includes: a first acoustic model including a first artificial neural network module using voice information as input information and using alignment information corresponding to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LEE SANG EON, LEE MUN HAK, SEONG JU SEOK, CHANG JOON HYEOK
Format:	Patent
Sprache:	eng ; kor
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In accordance with one embodiment, an acoustic model performing section-by-section training based on alignment information includes: a first acoustic model including a first artificial neural network module using voice information as input information and using alignment information corresponding to the voice information as output information; and a second acoustic model including a second artificial neural network module using the voice information as input information, and using text information corresponding to the voice information as output information, wherein the second acoustic model can be trained through a method of generating reference data necessary to train the second acoustic model based on the alignment information, and then calibrating a parameter of the second acoustic model using a loss function generated based on the reference data and the text information. Therefore, the present invention is capable of preventing an acoustic model from being overfitted. 일 실시예에 따른 얼라이먼트 정보를 기초로 구간별 학습을 수행하는 음향 모델은 음성 정보를 입력 정보로 하고, 상기 음성 정보에 대응되는 얼라이먼트 (alignment) 정보를 출력 정보로 하는 제1인공신경망 모듈을 포함하는 제1음향 모델(Acoustic Model) 및 상기 음성 정보를 입력 정보로 하고, 상기 음성 정보에 대응되는 텍스트 정보를 출력 정보로 하는 제2인공신경망 모듈을 포함하는 제2음향 모델;을 포함하고, 상기 제2음향 모델은, 상기 얼라이먼트 정보를 기초로 상기 제2음향 모델의 학습을 수행하는데 필요한 레퍼런스 데이터를 생성한 후, 상기 레퍼런스 데이터 및 상기 텍스트 정보를 기초로 생성된 손실함수를 이용하여 상기 제2음향 모델의 파라미터를 보정하는 방법으로 학습을 수행할 수 있다.