A real-time Thai speech synthesizer on a mobile device
Several Thai TTS systems are already available on a resourceful platform such as a personal computer. However, porting these systems to a resource limited device such as a mobile phone is not an easy task. Practical aspects including application size and processing time have to be concerned. In this...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Several Thai TTS systems are already available on a resourceful platform such as a personal computer. However, porting these systems to a resource limited device such as a mobile phone is not an easy task. Practical aspects including application size and processing time have to be concerned. In this paper, we aim at developing a Thai speech synthesizer that can produce an output speech in real-time on a mobile device. Our synthesizer is based on Flite, an open source synthesis library developed by Carnegie Mellon University. Flite is suitable for a limited resource device as it is both small and fast. To use Flite as a text-to-speech engine for Thai, many components have to be modified. First, a word segmentation component and a Thai pronunciation dictionary are added to determine word boundaries and the pronunciation of each word in Thai input text. To minimize the resource, a simple word segmentation algorithm, a longest matching, is employed. Next, to handle the tones in Thai, we integrate tones with phones and define a tonal phone set for Thai. Lastly, a small Thai speech database is essential. For this, we transform a unit selection database into a diphone database by selecting only necessary diphones. We conducted an experiment to compare our speech synthesizer with pTalk, an HMM-based speech synthesizer, both in terms of speed and sound quality measured by a subjective listening test. While the quality of our output speech may not be as good as the output from pTalk, our system is much faster and more stable than pTalk. |
---|---|
DOI: | 10.1109/SNLP.2009.5340907 |