DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech

The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utili...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of biomedical and health informatics 2024-12, p.1-12
Hauptverfasser: Zhang, Zhenglin, Wang, Tengfei, Hu, Zian, Yang, Li-Zhuang, Li, Hai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.
ISSN:2168-2194
DOI:10.1109/JBHI.2024.3509620