A Triplet Multimodel Transfer Learning Network for Speech Disorder Screening of Parkinson’s Disease

Deterioration in the quality of a person’s voice and speech is an early sign of Parkinson’s disease (PD). Although a number of computer-based methods have been invested to use patients’ speech for early diagnosis of Parkinson’s disease, they only focus on a fixed pronunciation test, such as the subj...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of intelligent systems 2024-03, Vol.2024, p.1-20
Hauptverfasser:	Zhao, Aite, Wang, Nana, Niu, Xuesen, Chen, Ming, Wu, Huimin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Deep learning Learning Machine learning Parkinson's disease Pattern recognition Speech disorders Speech recognition Time series Tremor (Muscular contraction) Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deterioration in the quality of a person’s voice and speech is an early sign of Parkinson’s disease (PD). Although a number of computer-based methods have been invested to use patients’ speech for early diagnosis of Parkinson’s disease, they only focus on a fixed pronunciation test, such as the subjects’ monosyllabic pronunciation is analyzed to determine whether they have potential possibility of PD. Moreover, only using traditional speech analysis methods to extract single-view speech features cannot provide a comprehensive feature representation. This paper is dedicated to the study of various pronunciation tests for patients with PD, including the pronunciation of five monosyllabic vowels and a spontaneous dialogue. A triplet multimodel transfer learning network is designed and proposed for identifying subjects with PD in these two groups of tests. First, multisource data extract mel frequency cepstrum coefficient (MFCC) features of speech for preprocessing. Subsequently, a pretrained triplet model represents features from three dimensions as the upstream task of the transfer learning framework. Finally, the pretrained model is reconstructed as a novel model that integrates the triplet model, temporal model, and auxiliary layer as the downstream task, and weights are updated through fine-tuning to identify abnormal speech. Experimental results show that the highest PD detection rates in the two groups of tests are 99% and 90% , respectively, which outperform a large number of internationally popular pattern recognition algorithms and serve as a baseline for other academic researchers in this field.
ISSN:	0884-8173 1098-111X
DOI:	10.1155/2024/8890592