Selfsupervised learning for pathological speech detection
Speech production is a complex phenomenon, wherein the brain orchestrates a sequence of processes involving thought processing, motor planning, and the execution of articulatory movements. However, this intricate execution of various processes is susceptible to influence and disruption by various ne...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speech production is a complex phenomenon, wherein the brain orchestrates a
sequence of processes involving thought processing, motor planning, and the
execution of articulatory movements. However, this intricate execution of
various processes is susceptible to influence and disruption by various
neurodegenerative pathological speech disorders, such as Parkinsons' disease,
resulting in dysarthria, apraxia, and other conditions. These disorders lead to
pathological speech characterized by abnormal speech patterns and imprecise
articulation. Diagnosing these speech disorders in clinical settings typically
involves auditory perceptual tests, which are time-consuming, and the diagnosis
can vary among clinicians based on their experiences, biases, and cognitive
load during the diagnosis. Additionally, unlike neurotypical speakers, patients
with speech pathologies or impairments are unable to access various virtual
assistants such as Alexa, Siri, etc. To address these challenges, several
automatic pathological speech detection (PSD) approaches have been proposed.
These approaches aim to provide efficient and accurate detection of speech
disorders, thereby facilitating timely intervention and support for individuals
affected by these conditions. These approaches mainly vary in two aspects: the
input representations utilized and the classifiers employed. Due to the limited
availability of data, the performance of detection remains subpar.
Self-supervised learning (SSL) embeddings, such as wav2vec2, and their
multilingual versions, are being explored as a promising avenue to improve
performance. These embeddings leverage self-supervised learning techniques to
extract rich representations from audio data, thereby offering a potential
solution to address the limitations posed by the scarcity of labeled data. |
---|---|
DOI: | 10.48550/arxiv.2406.02572 |