Glottal inverse filtering by combining a constrained LP and an HMM-based generative model of glottal flow derivative
Glottal flow is expected to convey useful information that can be effectively used in several speech applications, such as speech synthesis, expressive speech processing, speaker recognition, and voice-based biomedical engineering. Glottal inverse filtering (GIF) estimates the glottal flow that cann...
Gespeichert in:
Veröffentlicht in: | Speech communication 2018-11, Vol.104, p.113-128 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Glottal flow is expected to convey useful information that can be effectively used in several speech applications, such as speech synthesis, expressive speech processing, speaker recognition, and voice-based biomedical engineering. Glottal inverse filtering (GIF) estimates the glottal flow that cannot be directly measured from a speech signal without any prior knowledge. Thus far, although many GIF methods have been proposed, several studies have concluded that conventional GIFs tend to degrade in estimation accuracy, especially in analyses of speech signals of high fundamental frequencies. A method based on the auto-regressive hidden Markov model (AR-HMM) was introduced to improve the estimation accuracy of the AR filter representing the vocal tract from such high fundamental frequency speech signals. In the previous AR-HMM analysis, the HMM represented a generative model of an excitation source as an impulse train that might have little information related to any physically observable signal. Therefore, the learned HMM was not expected to convey essential information of the glottal flow. The main goal of the present study is to realize an HMM-based generative model of the glottal flow derivative by imposing constraints on AR filter optimization. The proposed AR-HMM analysis is expected to realize a robust GIF for high fundamental frequency speech signals. We evaluated the proposed method by using two test sets generated by a linear source-filter model and a physiological speech synthesizer. The results indicate that the AR-HMM analysis tends to outperform the closed phase covariance analysis on constrained linear prediction in the evaluation of the linear source-filter model-based test set. |
---|---|
ISSN: | 0167-6393 1872-7182 |
DOI: | 10.1016/j.specom.2018.07.002 |