A real-time Thai speech synthesizer on a mobile device

Several Thai TTS systems are already available on a resourceful platform such as a personal computer. However, porting these systems to a resource limited device such as a mobile phone is not an easy task. Practical aspects including application size and processing time have to be concerned. In this...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wongpatikaseree, K., Ratikan, A., Thangthai, A., Chotimongkol, A., Nattee, C.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Several Thai TTS systems are already available on a resourceful platform such as a personal computer. However, porting these systems to a resource limited device such as a mobile phone is not an easy task. Practical aspects including application size and processing time have to be concerned. In this paper, we aim at developing a Thai speech synthesizer that can produce an output speech in real-time on a mobile device. Our synthesizer is based on Flite, an open source synthesis library developed by Carnegie Mellon University. Flite is suitable for a limited resource device as it is both small and fast. To use Flite as a text-to-speech engine for Thai, many components have to be modified. First, a word segmentation component and a Thai pronunciation dictionary are added to determine word boundaries and the pronunciation of each word in Thai input text. To minimize the resource, a simple word segmentation algorithm, a longest matching, is employed. Next, to handle the tones in Thai, we integrate tones with phones and define a tonal phone set for Thai. Lastly, a small Thai speech database is essential. For this, we transform a unit selection database into a diphone database by selecting only necessary diphones. We conducted an experiment to compare our speech synthesizer with pTalk, an HMM-based speech synthesizer, both in terms of speed and sound quality measured by a subjective listening test. While the quality of our output speech may not be as good as the output from pTalk, our system is much faster and more stable than pTalk.
DOI:10.1109/SNLP.2009.5340907