Parasitic sorority of speech processing algorithms with an assortment of statistical toolkits

Speech is a one-dimensional quasi non-stationary time varying signal produced by a sequence of sounds. Speech signals are random in nature. Speech signals are easily corrupted by noise so recognition is an important role in speech processing. Many researches have designed recognition system with cha...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2021-08, Vol.1998 (1), p.12024
Hauptverfasser: Sudhakaran, Prathibha, Yadav, Ashwani Kumar, Karamchandani, Sunil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech is a one-dimensional quasi non-stationary time varying signal produced by a sequence of sounds. Speech signals are random in nature. Speech signals are easily corrupted by noise so recognition is an important role in speech processing. Many researches have designed recognition system with challenging parameters. Speech corpus can vary from environment, region, dialects, age, rate at which words are spoken. Pre-processing is the first step which includes framing, de-noisingand filtering. This paper focuses on speech techniques and statistical open source tools such as HTK, Julius, CMUSphinx and Kaldi. The word error rate obtained using all the toolkits on WSJ1 corpus gives us a clear understanding that Kaldi stands out as the most advanced recipes and scripts for speech recognition systems. An Indian English corpus by IITM was implemented in Kaldi yeilds WER of 6.41 and has been compared to other indian and international languages and well known corpuses.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1998/1/012024