Using probabilistic characteristic vector based on both phonetic and prosodic features for language identification

Language identification (LID) is an important task in indexing of audio signals. This paper introduces a LID system with a generative frontend based on both phonetic and prosodic features. The generative frontend is built upon an ensemble of Gaussian densities. Half of these Gaussian densities are t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: S.A., Hosseini Amereei, M. M., Homayounpour
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Language identification (LID) is an important task in indexing of audio signals. This paper introduces a LID system with a generative frontend based on both phonetic and prosodic features. The generative frontend is built upon an ensemble of Gaussian densities. Half of these Gaussian densities are trained to represent elementary speech sound units and the others are trained to represent prosodic properties that both characterize a wide variety of languages. Shifted Delta Cepstral (SDC) and Pitch Contour Polynomial Approximation (PCPA) are used as feature. The backend classifier is Support Vector Machine (SVM). Several language identification experiments were conducted and the proposed improvements were evaluated using OGI-MLTS corpus. Using SVM with (Generalized Linear Discriminant Analysis) GLDS and Probabilistic Sequence Kernel (PSK) outperforms GMM where all systems are based on PCPA, and improves LID performance about 2.1% and 5.9% respectively. Furthermore, something in the region of 4% improvement was achieved by combining both phonetic and prosodic features in our four languages identification experiments.
DOI:10.1109/ISTEL.2010.5734122