DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech
The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utili...
Gespeichert in:
Veröffentlicht in: | IEEE journal of biomedical and health informatics 2024-12, p.1-12 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 12 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE journal of biomedical and health informatics |
container_volume | |
creator | Zhang, Zhenglin Wang, Tengfei Hu, Zian Yang, Li-Zhuang Li, Hai |
description | The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring. |
doi_str_mv | 10.1109/JBHI.2024.3509620 |
format | Article |
fullrecord | <record><control><sourceid>ieee_RIE</sourceid><recordid>TN_cdi_ieee_primary_10806741</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10806741</ieee_id><sourcerecordid>10806741</sourcerecordid><originalsourceid>FETCH-ieee_primary_108067413</originalsourceid><addsrcrecordid>eNqFjTFPwzAUhD2ARAX9AUgMb2NKsNMQEraUpkpbykIkxsqQ18Y0tiM_o1J-Bz8YD9255XR30neMXQseC8GLu-W0XsQJT9J4cs-LLOFnbJSILI8SUaQXbEz0yYPyUBXZiP3OqnX10izKRyihPr471ULpPRqvrImmkrCF9Vfvlbat7EGaU4waSXt4RumMMjuYO6nxYN0e3pTvoPoe0HlYGXvosd0hbK2Dsv_pUGl0twQzRRjYUBIhkQ53AWE1vA6IH90VO9_KnnB88kt2M6-apzpSiLgZnNLSHTeC5zx7SMXkn_kPpX9WNQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Zhenglin ; Wang, Tengfei ; Hu, Zian ; Yang, Li-Zhuang ; Li, Hai</creator><creatorcontrib>Zhang, Zhenglin ; Wang, Tengfei ; Hu, Zian ; Yang, Li-Zhuang ; Li, Hai</creatorcontrib><description>The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.</description><identifier>ISSN: 2168-2194</identifier><identifier>DOI: 10.1109/JBHI.2024.3509620</identifier><identifier>CODEN: IJBHA9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Acoustics ; Alzheimer's disease ; Artificial intelligence ; Bioinformatics ; Context modeling ; Diseases ; expert knowledge ; Feature extraction ; Hands ; hybrid attention ; Linguistics ; multi-task learning ; multimodal fusion ; Multitasking ; speech analysis</subject><ispartof>IEEE journal of biomedical and health informatics, 2024-12, p.1-12</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-5490-6815 ; 0000-0001-8504-5811</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10806741$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10806741$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Zhenglin</creatorcontrib><creatorcontrib>Wang, Tengfei</creatorcontrib><creatorcontrib>Hu, Zian</creatorcontrib><creatorcontrib>Yang, Li-Zhuang</creatorcontrib><creatorcontrib>Li, Hai</creatorcontrib><title>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</title><title>IEEE journal of biomedical and health informatics</title><addtitle>JBHI</addtitle><description>The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.</description><subject>Accuracy</subject><subject>Acoustics</subject><subject>Alzheimer's disease</subject><subject>Artificial intelligence</subject><subject>Bioinformatics</subject><subject>Context modeling</subject><subject>Diseases</subject><subject>expert knowledge</subject><subject>Feature extraction</subject><subject>Hands</subject><subject>hybrid attention</subject><subject>Linguistics</subject><subject>multi-task learning</subject><subject>multimodal fusion</subject><subject>Multitasking</subject><subject>speech analysis</subject><issn>2168-2194</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqFjTFPwzAUhD2ARAX9AUgMb2NKsNMQEraUpkpbykIkxsqQ18Y0tiM_o1J-Bz8YD9255XR30neMXQseC8GLu-W0XsQJT9J4cs-LLOFnbJSILI8SUaQXbEz0yYPyUBXZiP3OqnX10izKRyihPr471ULpPRqvrImmkrCF9Vfvlbat7EGaU4waSXt4RumMMjuYO6nxYN0e3pTvoPoe0HlYGXvosd0hbK2Dsv_pUGl0twQzRRjYUBIhkQ53AWE1vA6IH90VO9_KnnB88kt2M6-apzpSiLgZnNLSHTeC5zx7SMXkn_kPpX9WNQ</recordid><startdate>20241217</startdate><enddate>20241217</enddate><creator>Zhang, Zhenglin</creator><creator>Wang, Tengfei</creator><creator>Hu, Zian</creator><creator>Yang, Li-Zhuang</creator><creator>Li, Hai</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><orcidid>https://orcid.org/0000-0002-5490-6815</orcidid><orcidid>https://orcid.org/0000-0001-8504-5811</orcidid></search><sort><creationdate>20241217</creationdate><title>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</title><author>Zhang, Zhenglin ; Wang, Tengfei ; Hu, Zian ; Yang, Li-Zhuang ; Li, Hai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_108067413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Acoustics</topic><topic>Alzheimer's disease</topic><topic>Artificial intelligence</topic><topic>Bioinformatics</topic><topic>Context modeling</topic><topic>Diseases</topic><topic>expert knowledge</topic><topic>Feature extraction</topic><topic>Hands</topic><topic>hybrid attention</topic><topic>Linguistics</topic><topic>multi-task learning</topic><topic>multimodal fusion</topic><topic>Multitasking</topic><topic>speech analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhenglin</creatorcontrib><creatorcontrib>Wang, Tengfei</creatorcontrib><creatorcontrib>Hu, Zian</creatorcontrib><creatorcontrib>Yang, Li-Zhuang</creatorcontrib><creatorcontrib>Li, Hai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><jtitle>IEEE journal of biomedical and health informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Zhenglin</au><au>Wang, Tengfei</au><au>Hu, Zian</au><au>Yang, Li-Zhuang</au><au>Li, Hai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech</atitle><jtitle>IEEE journal of biomedical and health informatics</jtitle><stitle>JBHI</stitle><date>2024-12-17</date><risdate>2024</risdate><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>2168-2194</issn><coden>IJBHA9</coden><abstract>The prevalence of Alzheimer's disease (AD) is rising annually, imposing a severe burden on patients and society. Therefore, assisted AD assessment is crucial. The decline in language function and the cognitive impairment it reflects are key external manifestations of AD. Many studies have utilized speech analysis to achieve convenient, non-invasive, and low-cost AD detection. Although state-of-the-art researches achieve high-precision AD detection using multimodal information, these studies often ignore interactions between different modalities and lack explanations for complex models. To address this, we propose a multi-task learning (MTL) AD assessment model that combines hybrid attention with multimodal representations. The model fuses audio, text, and expert knowledge to fully capture intra- and inter-modal interactions, achieving simultaneous AD detection and cognitive state prediction, along with comprehensive explainability analyses of the model and various modalities. Results show that the proposed method is sufficiently sensitive in assessing AD, achieving 89.58% accuracy and 91.67% recall for the classification task and a root mean square error of 4.31 for the regression task with good generalization performance. Multimodal representations with expert knowledge and MTL contribute to AD assessment performance. Explainability analyses indicate that, compared to healthy controls, AD patients exhibit slower speech rates, reduced syntactic complexity, and a greater tendency to use pause fillers and pronouns. Therefore, Our study validates the effectiveness of the proposed method, addressing trust issues in clinical practice for assisted decision-making and further advancing the development of speech as a promising biomarker for early AD screening and cognitive decline monitoring.</abstract><pub>IEEE</pub><doi>10.1109/JBHI.2024.3509620</doi><orcidid>https://orcid.org/0000-0002-5490-6815</orcidid><orcidid>https://orcid.org/0000-0001-8504-5811</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2168-2194 |
ispartof | IEEE journal of biomedical and health informatics, 2024-12, p.1-12 |
issn | 2168-2194 |
language | eng |
recordid | cdi_ieee_primary_10806741 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy Acoustics Alzheimer's disease Artificial intelligence Bioinformatics Context modeling Diseases expert knowledge Feature extraction Hands hybrid attention Linguistics multi-task learning multimodal fusion Multitasking speech analysis |
title | DEMENTIA: A Hybrid Attention-Based Multimodal and Multi-Task Learning Framework With Expert Knowledge for Alzheimer's Disease Assessment From Speech |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T13%3A20%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DEMENTIA:%20A%20Hybrid%20Attention-Based%20Multimodal%20and%20Multi-Task%20Learning%20Framework%20With%20Expert%20Knowledge%20for%20Alzheimer's%20Disease%20Assessment%20From%20Speech&rft.jtitle=IEEE%20journal%20of%20biomedical%20and%20health%20informatics&rft.au=Zhang,%20Zhenglin&rft.date=2024-12-17&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=2168-2194&rft.coden=IJBHA9&rft_id=info:doi/10.1109/JBHI.2024.3509620&rft_dat=%3Cieee_RIE%3E10806741%3C/ieee_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10806741&rfr_iscdi=true |