Machine Learning Approach to Identify Dysregulation in the Plasma Proteome of CADASIL
Background The plasma proteome provides information to identify dysregulated molecular pathways underlying disease. Cerebrovascular diseases often manifest a significantly different plasma proteomic signature. Cerebral‐Autosomal‐Dominant‐Arteriopathy‐Subcortical‐Infarcts‐and‐Leukoencephalopathy (CAD...
Gespeichert in:
Veröffentlicht in: | Alzheimer's & dementia 2023-12, Vol.19 (S11), p.n/a |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background
The plasma proteome provides information to identify dysregulated molecular pathways underlying disease. Cerebrovascular diseases often manifest a significantly different plasma proteomic signature. Cerebral‐Autosomal‐Dominant‐Arteriopathy‐Subcortical‐Infarcts‐and‐Leukoencephalopathy (CADASIL) is one such vascular brain disease. The CADASIL plasma proteome has yet to be investigated.
Method
To investigate the plasma proteome in the context of CADASIL, we used large‐scale proteomics, measuring over 7,000 proteins via aptamer‐based technology (SomaLogic) from 53 study participants (nCADASIL = 25; nControl = 28). We employed machine learning (ML) methods as an unbiased approach to uncover disease‐associated change in proteomic networks. This approach is based on the premise that protein sets that best classify disease state could be important biological drivers of disease. We developed a novel ML method: coupling recursive feature extraction with logistic regression (LR) and XGBoost evaluators, as well maximum‐relevance‐minimum‐redundancy with Random‐Forest and F‐Statistic evaluators. The results of all four models were selectively aggregated in order to delineate the CADASIL plasma proteome signature while minimizing overfitting to the low sample size data. We developed a 45 protein model of the CADASIL proteomic signature from these results. To evaluate the classification ability, we tested the Repeated‐Stratified‐10‐fold‐Cross‐Validation accuracy of a LR classifier. Next, we investigated whether the CADASIL proteomic signature was relevant for other neurodegenerative diseases with vascular components; the classifier was applied to an Alzheimer’s Disease (AD) dataset. Finally, the functional signature of the 45 proteins was investigated through gene ontology (GO).
Result
The LR classifier in CADASIL is 99.8% accurate. In AD, the accuracy is 71.2%. This implies the CADASIL signature shares commonalities as well as pertinent differences with other neurodegenerative diseases. The GO of the 45 proteins resulted in GO terms corresponding to extracellular matrix and collagen, cellular membrane lipoproteins and glycolipids, and proteasome biological pathways.
Conclusion
The proposed method allows for an initial unbiased discovery of important proteomic changes associated with disease state. The identified proteins would provide starting points for mechanistic studies. In addition, a panel of proteins could be used for screening at population level for r |
---|---|
ISSN: | 1552-5260 1552-5279 |
DOI: | 10.1002/alz.082198 |