Parasitic sorority of speech processing algorithms with an assortment of statistical toolkits
Speech is a one-dimensional quasi non-stationary time varying signal produced by a sequence of sounds. Speech signals are random in nature. Speech signals are easily corrupted by noise so recognition is an important role in speech processing. Many researches have designed recognition system with cha...
Gespeichert in:
Veröffentlicht in: | Journal of physics. Conference series 2021-08, Vol.1998 (1), p.12024 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speech is a one-dimensional quasi non-stationary time varying signal produced by a sequence of sounds. Speech signals are random in nature. Speech signals are easily corrupted by noise so recognition is an important role in speech processing. Many researches have designed recognition system with challenging parameters. Speech corpus can vary from environment, region, dialects, age, rate at which words are spoken. Pre-processing is the first step which includes framing, de-noisingand filtering. This paper focuses on speech techniques and statistical open source tools such as HTK, Julius, CMUSphinx and Kaldi. The word error rate obtained using all the toolkits on WSJ1 corpus gives us a clear understanding that Kaldi stands out as the most advanced recipes and scripts for speech recognition systems. An Indian English corpus by IITM was implemented in Kaldi yeilds WER of 6.41 and has been compared to other indian and international languages and well known corpuses. |
---|---|
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/1998/1/012024 |