DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech

The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utili...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of biomedical and health informatics 2024-12, p.1-12
Hauptverfasser: Zhang, Zhenglin, Wang, Tengfei, Hu, Zian, Yang, Li-Zhuang, Li, Hai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 12
container_issue
container_start_page 1
container_title IEEE journal of biomedical and health informatics
container_volume
creator Zhang, Zhenglin
Wang, Tengfei
Hu, Zian
Yang, Li-Zhuang
Li, Hai
description The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.
doi_str_mv 10.1109/JBHI.2024.3509620
format Article
fullrecord <record><control><sourceid>ieee_RIE</sourceid><recordid>TN_cdi_ieee_primary_10806741</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10806741</ieee_id><sourcerecordid>10806741</sourcerecordid><originalsourceid>FETCH-ieee_primary_108067413</originalsourceid><addsrcrecordid>eNqFjTFPwzAUhD2ARAX9AUgMb2NKsNMQEraUpkpbykIkxsqQ18Y0tiM_o1J-Bz8YD9255XR30neMXQseC8GLu-W0XsQJT9J4cs-LLOFnbJSILI8SUaQXbEz0yYPyUBXZiP3OqnX10izKRyihPr471ULpPRqvrImmkrCF9Vfvlbat7EGaU4waSXt4RumMMjuYO6nxYN0e3pTvoPoe0HlYGXvosd0hbK2Dsv_pUGl0twQzRRjYUBIhkQ53AWE1vA6IH90VO9_KnnB88kt2M6-apzpSiLgZnNLSHTeC5zx7SMXkn_kPpX9WNQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Zhenglin ; Wang, Tengfei ; Hu, Zian ; Yang, Li-Zhuang ; Li, Hai</creator><creatorcontrib>Zhang, Zhenglin ; Wang, Tengfei ; Hu, Zian ; Yang, Li-Zhuang ; Li, Hai</creatorcontrib><description>The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.</description><identifier>ISSN: 2168-2194</identifier><identifier>DOI: 10.1109/JBHI.2024.3509620</identifier><identifier>CODEN: IJBHA9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Acoustics ; Alzheimer's disease ; Artificial intelligence ; Bioinformatics ; Context modeling ; Diseases ; expert knowledge ; Feature extraction ; Hands ; hybrid attention ; Linguistics ; multi-task learning ; multimodal fusion ; Multitasking ; speech analysis</subject><ispartof>IEEE journal of biomedical and health informatics, 2024-12, p.1-12</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-5490-6815 ; 0000-0001-8504-5811</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10806741$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10806741$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Zhenglin</creatorcontrib><creatorcontrib>Wang, Tengfei</creatorcontrib><creatorcontrib>Hu, Zian</creatorcontrib><creatorcontrib>Yang, Li-Zhuang</creatorcontrib><creatorcontrib>Li, Hai</creatorcontrib><title>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</title><title>IEEE journal of biomedical and health informatics</title><addtitle>JBHI</addtitle><description>The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.</description><subject>Accuracy</subject><subject>Acoustics</subject><subject>Alzheimer's disease</subject><subject>Artificial intelligence</subject><subject>Bioinformatics</subject><subject>Context modeling</subject><subject>Diseases</subject><subject>expert knowledge</subject><subject>Feature extraction</subject><subject>Hands</subject><subject>hybrid attention</subject><subject>Linguistics</subject><subject>multi-task learning</subject><subject>multimodal fusion</subject><subject>Multitasking</subject><subject>speech analysis</subject><issn>2168-2194</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqFjTFPwzAUhD2ARAX9AUgMb2NKsNMQEraUpkpbykIkxsqQ18Y0tiM_o1J-Bz8YD9255XR30neMXQseC8GLu-W0XsQJT9J4cs-LLOFnbJSILI8SUaQXbEz0yYPyUBXZiP3OqnX10izKRyihPr471ULpPRqvrImmkrCF9Vfvlbat7EGaU4waSXt4RumMMjuYO6nxYN0e3pTvoPoe0HlYGXvosd0hbK2Dsv_pUGl0twQzRRjYUBIhkQ53AWE1vA6IH90VO9_KnnB88kt2M6-apzpSiLgZnNLSHTeC5zx7SMXkn_kPpX9WNQ</recordid><startdate>20241217</startdate><enddate>20241217</enddate><creator>Zhang, Zhenglin</creator><creator>Wang, Tengfei</creator><creator>Hu, Zian</creator><creator>Yang, Li-Zhuang</creator><creator>Li, Hai</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><orcidid>https://orcid.org/0000-0002-5490-6815</orcidid><orcidid>https://orcid.org/0000-0001-8504-5811</orcidid></search><sort><creationdate>20241217</creationdate><title>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</title><author>Zhang, Zhenglin ; Wang, Tengfei ; Hu, Zian ; Yang, Li-Zhuang ; Li, Hai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_108067413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Acoustics</topic><topic>Alzheimer's disease</topic><topic>Artificial intelligence</topic><topic>Bioinformatics</topic><topic>Context modeling</topic><topic>Diseases</topic><topic>expert knowledge</topic><topic>Feature extraction</topic><topic>Hands</topic><topic>hybrid attention</topic><topic>Linguistics</topic><topic>multi-task learning</topic><topic>multimodal fusion</topic><topic>Multitasking</topic><topic>speech analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhenglin</creatorcontrib><creatorcontrib>Wang, Tengfei</creatorcontrib><creatorcontrib>Hu, Zian</creatorcontrib><creatorcontrib>Yang, Li-Zhuang</creatorcontrib><creatorcontrib>Li, Hai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><jtitle>IEEE journal of biomedical and health informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Zhenglin</au><au>Wang, Tengfei</au><au>Hu, Zian</au><au>Yang, Li-Zhuang</au><au>Li, Hai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</atitle><jtitle>IEEE journal of biomedical and health informatics</jtitle><stitle>JBHI</stitle><date>2024-12-17</date><risdate>2024</risdate><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>2168-2194</issn><coden>IJBHA9</coden><abstract>The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.</abstract><pub>IEEE</pub><doi>10.1109/JBHI.2024.3509620</doi><orcidid>https://orcid.org/0000-0002-5490-6815</orcidid><orcidid>https://orcid.org/0000-0001-8504-5811</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2168-2194
ispartof IEEE journal of biomedical and health informatics, 2024-12, p.1-12
issn 2168-2194
language eng
recordid cdi_ieee_primary_10806741
source IEEE Electronic Library (IEL)
subjects Accuracy
Acoustics
Alzheimer's disease
Artificial intelligence
Bioinformatics
Context modeling
Diseases
expert knowledge
Feature extraction
Hands
hybrid attention
Linguistics
multi-task learning
multimodal fusion
Multitasking
speech analysis
title DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T13%3A20%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DEMENTIA:%20A%20Hybrid%20Attention-Based%20Multimodal%20and%20Multi-Task%20Learning%20Framework%20With%20Expert%20Knowledge%20for%20Alzheimer's%20Disease%20Assessment%20From%20Speech&rft.jtitle=IEEE%20journal%20of%20biomedical%20and%20health%20informatics&rft.au=Zhang,%20Zhenglin&rft.date=2024-12-17&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=2168-2194&rft.coden=IJBHA9&rft_id=info:doi/10.1109/JBHI.2024.3509620&rft_dat=%3Cieee_RIE%3E10806741%3C/ieee_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10806741&rfr_iscdi=true