Normal-rate to fast-rate speech conversion using non-linear compression maps

This paper presents a new technique to convert normal-rate speech into intelligible fast-rate, speeded speech. Speeded speech has long been recognized for its potential to improve spoken media comprehension; however, current tools to significantly speed playback of non-text media are insufficient du...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of the Acoustical Society of America 2016-10, Vol.140 (4), p.2965-2966
Hauptverfasser:	Fry, Michael D., Vatikiotis-Bateson, Eric
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a new technique to convert normal-rate speech into intelligible fast-rate, speeded speech. Speeded speech has long been recognized for its potential to improve spoken media comprehension; however, current tools to significantly speed playback of non-text media are insufficient due to their reliance on inaccurate phoneme analysis. With the ever increasing amount of non-text media online, a method to speed playback that is agnostic of phonemes is needed. Our technique uses spectral and source components of the acoustics to generate a non-linear compression map that characterizes how conversational-rate speech signals are compressed to achieve analogue fast-rate speech signals. A data set containing conversational- and fast-rate speech pairs was processed to determine compression maps corresponding to each pair. A Recursive Neural Network (RNN) was trained on the set of normal-rate speech and the corresponding compression maps. The RNN was then used to generate compression maps for novel normal-rate speech and ultimately output a fast-rate speech signal. Elicited fast-rate speech and speeded speech conversions technique are now being compared perceptually for intelligibility and naturalness.
ISSN:	0001-4966 1520-8524
DOI:	10.1121/1.4969172