METHOD AND DEVICE FOR SPEECH PROCESSING

Disclosed are a voice processing method and a voice processing apparatus capable of allowing a user terminal and a server to communicate with each other in a 5G communication environment by performing voice processing by executing a mounted artificial intelligence (AI) algorithm and/or a machine lea...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	CHAE JONG HOON
Format:	Patent
Sprache:	eng ; kor
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Disclosed are a voice processing method and a voice processing apparatus capable of allowing a user terminal and a server to communicate with each other in a 5G communication environment by performing voice processing by executing a mounted artificial intelligence (AI) algorithm and/or a machine learning algorithm. According to an embodiment of the present invention, the voice processing method comprises the steps of: receiving a user utterance voice; outputting a dubbing artist utterance voice corresponding to the user utterance voice with a dubbing artist voice having the highest similarity to a user voice using a user and dubbing artist mapping learning model; and recognizing a voice for the dubbing artist utterance voice. According to the present invention, the voice recognition performance can be increased by performing voice recognition processing after converting the user utterance voice received from a voice recognition processing front terminal into a voice of a dubbing artist which is the most similar to the user utterance voice. 탑재된 인공지능(artificial intelligence, AI) 알고리즘 및/또는 기계학습(machine learning) 알고리즘을 실행하여 음성 처리를 수행함으로써 5G 통신 환경에서 사용자 단말기와 서버가 통신할 수 있는 음성 처리 방법 및 음성 처리 장치가 개시된다. 본 발명의 일 실시 예에 따른 음성 처리 방법은, 사용자 발화 음성을 수신하는 단계와, 사용자의 음성과 유사도가 가장 높은 성우의 음성으로 사용자 발화 음성에 대응하는 성우 발화 음성을 사용자-성우 맵핑 학습 모델을 이용하여 출력하는 단계와, 성우 발화 음성에 대한 음성 인식을 수행하는 단계를 포함할 수 있다. 본 발명에 의하면, 음성 인식 처리 전단에서 수신한 사용자 발화 음성을 사용자 발화 음성과 가장 유사한 성우의 음성으로 변환한 후 음성 인식 처리를 수행하여 음성 인식 성능을 향상시킬 수 있다.