Toward assessment of human voice biomarkers of brain lesions through explainable deep learning
Lesions in the brain resulting from traumatic injuries or strokes can evolve into speech dysfunction in undiagnosed patients. Employing ML-based tools to analyze the prosody or articulatory phonetics of human speech could be advantageous for early screening of undetected brain injuries. Additionally...
Gespeichert in:
Veröffentlicht in: | Biomedical signal processing and control 2024-01, Vol.87, p.105457, Article 105457 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Lesions in the brain resulting from traumatic injuries or strokes can evolve into speech dysfunction in undiagnosed patients. Employing ML-based tools to analyze the prosody or articulatory phonetics of human speech could be advantageous for early screening of undetected brain injuries. Additionally, explaining the model’s decision-making process can support predictions and take appropriate measures to improve patient voice quality. However, traditional ML methods relying on low-level descriptors (LLDs) may sacrifice detailed temporal dynamics and other speech characteristics. Interpreting these descriptors can also be challenging, requiring significant effort to understand feature relationships and suitable ranges. To address these limitations, this research paper introduces xDMFCCs, a method that identifies interpretive discriminatory acoustic biomarkers from a single speech utterance, providing local and global interpretations of deep learning models in speech applications. To validate this approach, it was implemented to interpret a Convolutional Neural Network (CNN) trained on Mel-frequency Cepstral Coefficients (MFCC) for the binary classification task to differentiate between patients from control vocalizations. The ConvNet achieved promising results with a 75% f-score (75% recall, 76% precision), comparable to conventional machine learning baselines. What sets xDMFCCs apart is its explanation through a 2D time–frequency representation that preserves the complete speech signal. This representation offers a more transparent explanation for differentiating between patients and healthy controls, enhancing interpretability. This advancement enables more detailed and compelling studies in speech acoustic traits of brain lesions. Furthermore, the findings have significant implications for developing low-cost and rapid diagnostics of unnoticed brain lesions.
•This article suggests using deep learning and speech analysis for early brain lesion detection•The method generates a representation to identify relevant acoustic attributes and moments.•We pidentified the top 10 most relevant features of both male and female Random Forest classifiers.•We validate the results with 32 individuals, including patients with localized brain lesions. |
---|---|
ISSN: | 1746-8094 |
DOI: | 10.1016/j.bspc.2023.105457 |