Glottal inverse filtering by combining a constrained LP and an HMM-based generative model of glottal flow derivative

Glottal flow is expected to convey useful information that can be effectively used in several speech applications, such as speech synthesis, expressive speech processing, speaker recognition, and voice-based biomedical engineering. Glottal inverse filtering (GIF) estimates the glottal flow that cann...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Speech communication 2018-11, Vol.104, p.113-128
1. Verfasser:	Sasou, Akira
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Anatomical systems AR-HMM Autoregressive models Biomedical engineering Covariance Derivatives Filtration Frequencies Fundamental frequency Glottal flow Glottal inverse filtering Linear prediction Markov analysis Markov chains Model testing Optimization Regression analysis Resonant frequencies Speaker identification Speech Speech analysis Speech processing Speech recognition Speech synthesis Test sets Vocal tract Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Glottal flow is expected to convey useful information that can be effectively used in several speech applications, such as speech synthesis, expressive speech processing, speaker recognition, and voice-based biomedical engineering. Glottal inverse filtering (GIF) estimates the glottal flow that cannot be directly measured from a speech signal without any prior knowledge. Thus far, although many GIF methods have been proposed, several studies have concluded that conventional GIFs tend to degrade in estimation accuracy, especially in analyses of speech signals of high fundamental frequencies. A method based on the auto-regressive hidden Markov model (AR-HMM) was introduced to improve the estimation accuracy of the AR filter representing the vocal tract from such high fundamental frequency speech signals. In the previous AR-HMM analysis, the HMM represented a generative model of an excitation source as an impulse train that might have little information related to any physically observable signal. Therefore, the learned HMM was not expected to convey essential information of the glottal flow. The main goal of the present study is to realize an HMM-based generative model of the glottal flow derivative by imposing constraints on AR filter optimization. The proposed AR-HMM analysis is expected to realize a robust GIF for high fundamental frequency speech signals. We evaluated the proposed method by using two test sets generated by a linear source-filter model and a physiological speech synthesizer. The results indicate that the AR-HMM analysis tends to outperform the closed phase covariance analysis on constrained linear prediction in the evaluation of the linear source-filter model-based test set.
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2018.07.002