Feature selection and machine learning methods for optimal identification and prediction of subtypes in Parkinson's disease

•We explored hybrid ML systems including selector algorithms and classifiers.•Radiomics features are important for classification and clustering tasks.•The high score features selected by ensemble voting are not effective.•The features directly selected via selector algorithms are more important. Th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer methods and programs in biomedicine 2021-07, Vol.206, p.106131-106131, Article 106131
Hauptverfasser: Salmanpour, Mohammad R., Shamsaei, Mojtaba, Rahmim, Arman
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We explored hybrid ML systems including selector algorithms and classifiers.•Radiomics features are important for classification and clustering tasks.•The high score features selected by ensemble voting are not effective.•The features directly selected via selector algorithms are more important. The present work focuses on assessment of Parkinson's disease (PD), including both PD subtype identification (unsupervised task) and prediction (supervised task). We specifically investigate optimal feature selection and machine learning algorithms for these tasks. We selected 885 PD subjects as derived from longitudinal datasets (years 0–4; Parkinson's Progressive Marker Initiative), and investigated 981 features including motor, non-motor, and imaging features (SPECT-based radiomics features extracted using our standardized SERA software). Two different hybrid machine learning systems (HMLS) were constructed and applied to the data in order to select optimal combinations in both tasks: (i) identification of subtypes in PD (unsupervised-clustering), and (ii) prediction of these subtypes in year 4 (supervised-classification). From the original data based on years 0 (baseline) and 1, we created new datasets as inputs to the prediction task: (i,ii) CSD0 and CSD01: cross-sectional datasets from year 0 only and both years 0 & 1, respectively; (iii) TD01: timeless dataset from both years 0 & 1. In addition, PD subtype in year 4 was considered as outcome. Finally, high score features were derived via ensemble voting based on their prioritizations from feature selector algorithms (FSAs). In clustering task, the most optimal combinations (out of 981) were selected by individual FSAs to enable high correlation compared to using all features (arriving at 547). In prediction task, we were able to select optimal combinations, resulting in an accuracy >90% only for timeless dataset (TD01); there, we were able to select the most optimal combination using 77 features, directly selected by FSAs. In both tasks, however, using combination of only high score features from ensemble voting did not enable acceptable performances, showing optimal feature selection via individual FSAs to be more effective. Combining non-imaging information with SPECT-based radiomics features, and optimal utilization of HMLSs, can enable robust identification of subtypes as well as appropriate prediction of these subtypes in PD patients. Moreover, use of timeless dataset, beyond cross-sectional datasets,
ISSN:0169-2607
1872-7565
DOI:10.1016/j.cmpb.2021.106131