Personalized Spontaneous Speech Synthesis Using a Small-Sized Unsegmented Semispontaneous Speech
A systematic approach is proposed to synthesizing personalized spontaneous speech using a small-sized unsegmented speech corpus of the target speaker. First, an automatic segmentation algorithm is employed to segment and label a collected semispontaneous speech corpus of the target speaker. Then, a...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2017-05, Vol.25 (5), p.1048-1060 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A systematic approach is proposed to synthesizing personalized spontaneous speech using a small-sized unsegmented speech corpus of the target speaker. First, an automatic segmentation algorithm is employed to segment and label a collected semispontaneous speech corpus of the target speaker. Then, a pretrained average voice model is adapted to the voice model of the target speaker by using the segmented data. A postfilter based on modulation spectrum is adopted to further improve the speaker similarity of the synthesized speech as well as alleviate the over-smoothing problem of the synthesized speech. For generating spontaneous speech, a smoothing method applied at the prosodic word level is proposed to improve speech fluency. For objective evaluation on spontaneous speech segmentation, the segmentation accuracy of the proposed method is superior to that of Viterbi-based forced alignment. The results of subjective listening test also show that the proposed method can improve the spontaneity and speaker similarity of the synthesized speech compared to the maximum likelihood linear regression based speaker adaptation method. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2017.2679603 |