Personality perception in synthetic versus natural speech: The effects of voice quality and prosody

Synthetic speech technology, now approaching human-like naturalness due to advancements in deep learning, has turned to focusing on personality or persona design as a common practice in the industry. The present study aimed to identify speech characteristics affecting personality impressions in synt...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of the Acoustical Society of America 2024-03, Vol.155 (3_Supplement), p.A336-A336
Hauptverfasser:	Kim, Minjeong, Park, Jaehan, Jeong, Minhong, Song, Jieun
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Synthetic speech technology, now approaching human-like naturalness due to advancements in deep learning, has turned to focusing on personality or persona design as a common practice in the industry. The present study aimed to identify speech characteristics affecting personality impressions in synthetic and natural speech. Thirty native Korean speakers participated in a personality rating experiment in which they evaluated natural Korean sentences and their synthetic counterparts in terms of the Big-Five personality model. Acoustic analyses were performed to examine voice quality and prosody, including Intonational Phrase (IP) boundary tones. The results revealed that scores of agreeableness, conscientiousness, and emotional stability increased overall when the voices contained greater aperiodicity in the harmonics (i.e., were likely breathier) and were weaker in energy. The results also demonstrated that different prosodic features affected personality perception in synthetic and natural speech; synthetic speech with a wider F0 range received higher scores on extroversion, openness, and emotional stability. In contrast, the effect of IP boundary tones was most frequently found for female natural speech, which contained a wider range of tones, including multitonals (e.g., LHL%). Our findings suggest that intonation is one of the key factors which can be adjusted to generate synthetic speech with various personalities.
ISSN:	0001-4966 1520-8524
DOI:	10.1121/10.0027729