Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations

A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2016-08, Vol.11 (8), p.e0157077-e0157077
Hauptverfasser:	Dinov, Ivo D, Heavner, Ben, Tang, Ming, Glusman, Gustavo, Chard, Kyle, Darcy, Mike, Madduri, Ravi, Pa, Judy, Spino, Cathie, Kesselman, Carl, Foster, Ian, Deutsch, Eric W, Price, Nathan D, Van Horn, John D, Ames, Joseph, Clark, Kristi, Hood, Leroy, Hampstead, Benjamin M, Dauer, William, Toga, Arthur W
Format:	Artikel
Sprache:	eng
Schlagworte:	Aged Algorithms Alzheimer's disease Alzheimers disease Amyotrophic lateral sclerosis Analysis Analytics Artificial intelligence Big Data Biology Biology and Life Sciences Biomarkers Cerebellum Classification Complexity Computer and Information Sciences Data analysis Data management Data processing Databases, Factual Datasets Demographics Diagnosis Diagnostic systems Disease control Disease Progression Female Forecasting Genetics Health risks Humans Informatics Information management Information science Laboratories Learning algorithms Logistic Models Machine learning Male Mathematical models Medical diagnosis Medical imaging Medicine and Health Sciences Model accuracy Movement disorders Neural networks Neurodegeneration Neurodegenerative diseases Neuroimaging Neurology NMR Nuclear magnetic resonance Nursing schools Parkinson disease Parkinson Disease - diagnosis Parkinson Disease - genetics Parkinson Disease - pathology Parkinson's disease Parkinsons disease People and Places Physical Sciences Predictions Principal components analysis Proteins Research and Analysis Methods Science Science & Technology - Other Topics Statistics Support Vector Machine Support vector machines Trauma
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuro
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0157077