Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation

A preliminary version of this paper was presented at APIC-IST 2021, and was selected as an outstanding paper. This work was supported by the GRRC program of Gyeonggi province. [GRRC KGU 2020-B03, Industry Statistics and Data Mining Research] Mostly, artificial intelligence does not show any definite...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	KSII transactions on Internet and information systems 2022, 16(2), , pp.713-725
Hauptverfasser:	Kwon, Hye-Jeong, Kim, Min-Jeong, Baek, Ji-Won, Chung, Kyungyong
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Data mining Emotions Methods Voice recognition 컴퓨터학
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A preliminary version of this paper was presented at APIC-IST 2021, and was selected as an outstanding paper. This work was supported by the GRRC program of Gyeonggi province. [GRRC KGU 2020-B03, Industry Statistics and Data Mining Research] Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches. Keywords: Emotion Transformation, Generative Adversarial Network, Voice Frequency Synthesis, Voice Analysis
ISSN:	1976-7277 1976-7277
DOI:	10.3837/tiis.2022.02.018