SPEAKER EMBEDDING-BASED SPEAKER ADAPTATION METHOD AND SYSTEM GENERATED BY USING GLOBAL STYLE TOKENS AND PREDICTION MODEL

Disclosed are a speaker embedding-based speaker adaptation method and system generated by using global style tokens and a prediction model. The speaker adaptation method performed by the speaker adaptation system, according to an embodiment, may comprise the steps of: generating a plurality of speak...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LEE, Jaeuk, CHANG, Joon-Hyuk
Format:	Patent
Sprache:	eng ; fre ; kor
Schlagworte:	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Disclosed are a speaker embedding-based speaker adaptation method and system generated by using global style tokens and a prediction model. The speaker adaptation method performed by the speaker adaptation system, according to an embodiment, may comprise the steps of: generating a plurality of speaker embeddings representing the tone of a speaker from a speaker embedding by using a voice transformation model including a global style token mechanism; and predicting the final speaker embedding representing a new speaker through similarity comparison between a new speaker embedding predicted by using a prediction model for predicting a speaker embedding and the plurality of generated speaker embeddings. L'invention concerne un procédé et un système d'adaptation de locuteur basés sur l'incorporation de locuteur générés à l'aide de jetons de style global et d'un modèle de prédiction. Le procédé d'adaptation de locuteur mis en œuvre par le système d'adaptation de locuteur, selon un mode de réalisation, peut comprendre les étapes consistant à : générer une pluralité d'incorporations de locuteur représentant la tonalité d'un locuteur à partir d'une incorporation de locuteur à l'aide d'un modèle de transformation vocale comprenant un mécanisme de jetons de style global ; et prédire l'incorporation de locuteur finale représentant un nouveau locuteur par comparaison de similarité entre une nouvelle incorporation de locuteur prédite à l'aide d'un modèle de prédiction pour prédire une incorporation de locuteur et la pluralité d'incorporations de locuteur générées. 글로벌 스타일 토큰과 예측 모델로 생성한 화자 임베딩 기반의 화자 적응 방법 및 시스템이 개시된다. 일 실시예에 따른화자 적응 시스템에 의해 수행되는 화자 적응 방법은, 글로벌 스타일 토큰 메커니즘이 포함된 음성변환 모델을 이용하여 화자 임베딩으로부터 화자의 음색을 표현하는 복수 개의 화자 임베딩을 생성하는 단계; 및 화자 임베딩을 예측하는 예측 모델을 이용하여 예측된 새로운 화자 임베딩과 상기 생성된 복수 개의 화자 임베딩 사이의 유사도 비교를 통해 새로운 화자를 표현하는 최종의 화자 임베딩을 예측하는 단계를 포함할 수 있다.