Addressing the missing data challenge in multi-modal datasets for the diagnosis of Alzheimer’s disease

One of the challenges facing accurate diagnosis and prognosis of Alzheimer’s disease, beyond identifying the subtle changes that define its early onset, is the scarcity of sufficient data compounded by the missing data challenge. Although there are many participants in the Alzheimer’s Disease Neuroi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of neuroscience methods 2022-06, Vol.375, p.109582-109582, Article 109582
Hauptverfasser: Aghili, Maryamossadat, Tabarestani, Solale, Adjouadi, Malek
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:One of the challenges facing accurate diagnosis and prognosis of Alzheimer’s disease, beyond identifying the subtle changes that define its early onset, is the scarcity of sufficient data compounded by the missing data challenge. Although there are many participants in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, many of the observations have a lot of missing features which often leads to the exclusion of potentially valuable data points in many ongoing experiments, especially in longitudinal studies. Motivated by the necessity of examining all participants, even those with missing tests or imaging modalities, this study draws attention to the Gradient Boosting (GB) algorithm which has an inherent capability of addressing missing values. The four groups considered include: Cognitively Normal (CN), Early Mild Cognitive Impairment (EMCI), Late Mild Cognitive Impairment (LMCI) and Alzheimer's Disease (AD). Prior to applying state of the art classifiers such as Support Vector Machine (SVM) and Random Forest (RF), the impact of imputing (i.e., replacing) data in common datasets with numerical techniques has been investigated and compared with the GB algorithm. Empirical evaluations show that the GB performance is highly resilient to missing values in comparison to SVM and RF algorithms. These latter algorithms can however be improved when coupled with more sophisticated imputation technique such as soft-impute or K-Nearest Neighbors (KNN) algorithm assuming low extent of data incompleteness. The classification accuracy has been improved by up to 3% in the multiclass classification of all four classes of subjects when all the samples including the incomplete ones are considered during the model generation and testing phases. Unlike other methods, the proposed approach addresses the challenging multiclass classification of the ADNI dataset in the presence of different levels of missing data points. It also provides a comparative study on effects of existing imputation techniques on a block-wise missing data. Results of the proposed method are validated against gold standard methods used for AD classification. •Optimize the machine learning model by combining ensemble methods with imputation techniques.•Preserve statistical relevance of longitudinal studies by addressing the missing data challenge.•Perform multiclass classification of Alzheimer’s disease.•Segment and estimate volumes of disease-prone areas in the brain like the hippocampus.•See
ISSN:0165-0270
1872-678X
DOI:10.1016/j.jneumeth.2022.109582