APPARATUS METHOD COMPUTER-READABLE STORAGE MEDIUM AND COMPUTER PROGRAM FOR SPEAKER VOICE ANALYSIS

According to an embodiment, an apparatus for analyzing the voice of a speaker comprises: a feature extraction unit for extracting an average and a standard deviation of a voice signal from the voice signal of a speaker; a speaker classification unit including two hidden layers having a parallel stru...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	CHANG JUNHYUK, CHOI JUNGHWAN, JO JEIL
Format:	Patent
Sprache:	eng ; kor
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	According to an embodiment, an apparatus for analyzing the voice of a speaker comprises: a feature extraction unit for extracting an average and a standard deviation of a voice signal from the voice signal of a speaker; a speaker classification unit including two hidden layers having a parallel structure and not connected to each other, a weighted sum calculation unit, and a first output layer; and a channel classification unit including a gradient reversal layer (GRL) connected to any one of the two hidden layers to receive the average and standard deviation from the connected hidden layer, and a second output layer for classifying a channel of the speech signal based on data output from the GRL. Each of the two hidden layers receives the average and the standard deviation. The weighted sum calculation unit calculates a weight sum for each of the average and the standard deviation based on a result that the average and the standard deviation pass through each of the two hidden layers. The first output layer can classify the speaker of the voice signal based on the calculated weighted sum. Therefore, the apparatus can analyze the voice of the speaker through a deep neural network that solves a channel mismatch. 일 실시예에 따른 화자 음성 분석 장치는, 화자의 음성 신호로부터 상기 음성 신호에 대한 평균과 표준편차를 추출하는 특징 추출부; 병렬 구조이면서 서로 연결되지 않은 두 개의 은닉층과 가중치합 연산부 및 제1 출력층을 포함하는 화자 분류부; 및 상기 두 개의 은닉층 중 어느 하나에 연결되어서 상기 연결된 은닉층으로부터 상기 평균과 상기 표준편차를 전달받는 기울기 역전층(GRL: gradient reversal layer)과, 상기 기울기 역전층으로부터 출력되는 데이터를 기초로 상기 음성 신호의 채널을 분류하는 제2 출력층을 포함하는 채널 분류부를 포함하며, 상기 두 개의 은닉층 각각은 상기 평균과 상기 표준편차를 입력 받고, 상기 가중치합 연산부는 상기 평균과 상기 표준편차가 상기 두 개의 은닉층 각각을 통과한 결과에 기초하여 상기 평균과 상기 표준 편차 각각에 대한 가중치 합을 연산하며, 상기 제1 출력층은 상기 연산된 가중치 합을 기초로 상기 음성 신호에 대한 화자를 분류할 수 있다.