Phase Analysis of the Activity of a Voice Source

Mathematical models are proposed that make it possible to relate the parameters of a voice source with the parameters of the phase-frequency responses (PFR) of speech signal segments. In particular, it was found that the duration of operation of a source can be found from the average length of the i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Acoustical physics 2021-03, Vol.67 (2), p.193-209
Hauptverfasser: Sorokin, V. N., Leonov, A. S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Mathematical models are proposed that make it possible to relate the parameters of a voice source with the parameters of the phase-frequency responses (PFR) of speech signal segments. In particular, it was found that the duration of operation of a source can be found from the average length of the intervals between the zeros and discontinuities of these PFR. For synthetic and real speech signals, based on the established properties of the phase response and the proposed heuristic methods for their analysis, a numerical estimate of the periods of the fundamental tone, the duration of the voice source within these periods, as well as the moments of the beginning and end actions of the voice source. The existence of the upper limit of the frequency range of the fundamental tone within which the estimation error does not exceed 5% is experimentally established. The average error in estimating the duration of a voice source using the proposed method for speech segments from the Arctic database was less than 0.3% for two speakers, and for a third speaker, it was 6.2%. It is shown that the error in determining values and depends on the properties of the voice source and increases significantly for Hz. The most probable error in estimating quantities for three speakers from the Arctic database is estimated as 1.5, 10.2, and 13.5%; for , it is –9.7, –20.2, and –13.9%.
ISSN:1063-7710
1562-6865
DOI:10.1134/S106377102102007X