DNA APPARATUS AND METHOD FOR SPEECH SYNTHESIS USING VOICE COLOR CONVERSION AND SPEECH DNA CODES

According to the present invention, there is disclosed a voice synthesis apparatus and a voice synthesis method which convert a voice model of a reference speaker, which constitutes an automatic voice synthesis apparatus, into a voice model of a target speaker, which represents a tone color of a spe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	KIM, HOI RIN, SUH, YOUNG JOO
Format:	Patent
Sprache:	eng ; kor
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	According to the present invention, there is disclosed a voice synthesis apparatus and a voice synthesis method which convert a voice model of a reference speaker, which constitutes an automatic voice synthesis apparatus, into a voice model of a target speaker, which represents a tone color of a specific or virtual target speaker, by using voice deoxyribonucleic acid (DNA) information extracted from voice data of a user, and then synthesize a voice of the target speaker by mixing waves. The voice synthesis apparatus according to an embodiment of the present invention can comprise: a tone color converter which applies a speaker adaptation technology, which utilizes voice data collected from arbitrary speakers, to a voice model used for synthesizing a voice with a tone color of a reference speaker to convert the voice model to a voice model of a specific speaker; a DNA encoder which encodes tone color conversion information, which is used for converting the voice model by using the tone color converter such that the voice model represents a tone color of the specific speaker, and the converted voice model of the specific speaker into voice DNA information; a DNA recombiner which recombines the voice DNA information of a plurality of specific speakers according to proper ratios to synthesize voice DNA information for a virtual target speaker; a voice DNA decoder which decodes the voice DNA information obtained by the voice DNA encoder or the voice DNA recombiner; a target speaker voice model generator which generates a target speaker voice model from the voice model of the reference speaker by using the tone color conversion information of the target speaker, which is reconstructed by the voice DNA decoder; and a voice synthesizer which synthesizes a voice waveform corresponding to an inputted arbitrary text by applying the target speaker voice model which is reconstructed by using the voice DNA decoder or the target speaker voice model generator. 본 발명은 자동음성합성에서 음성합성 장치를 구성하는 기준화자의 음성모델을 사용자의 음성데이터로부터 추출한 음성DNA 정보를 사용하여 특정 또는 가상의 목적화자의 음색을 나타내는 목적화자 음성모델로 음색변환하여 목적화자의 음성을 파형으로 합성하는 음성합성 장치 및 방법이 개시된다. 일 실시예에 따른 음성합성 장치는 기준화자 음색의 음성합성을 위한 음성모델에 임의의 화자들로부터 수집된 음성데이터를 활용하는 화자적응 기법을 적용하여 특정화자 음성모델로 변환하는 음색변환기, 상기 음색변환기를 사용하여 특정화자의 음색을 나타내도록 음성모델을 변환시키는 음색변환정보나 변환된 특정화자 음성모델을 음성DNA 정보로 부호화하는 음성DNA 부호화기, 복수의 특정 화자들의 음성DNA 정보를 적합한 비율로 서로 재조합하여 가상의 목표화자에 대한 음성DNA 정보를 합성하는 음성DNA 재조합기, 상기 음성DNA 부호화기나 상기 음성DNA 재조합기를 사용하여 구해진 음성DNA 정보를 복호화하는 음성DNA 복호화기, 상기 음성DNA 복호화기를 사용