Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models
In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is impo...
Gespeichert in:
Veröffentlicht in: | Informatics in medicine unlocked 2021, Vol.26, p.100712, Article 100712 |
---|---|
Hauptverfasser: | , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China.
A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of “8+2” factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP22SHAP: SHapley Additive exPlanations. values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels.
Among all “8+2” risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995),33The value of importance. Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to “8+2” factors the importance of features for lifestyle information, demographic information and medical measurement are evaluated via random forest model. It shows that top five features are Systolic Blood Pressure (SBP) (0.3670), Diastolic Blood Pressure (DBP) (0.1541), Physical Inactivity (0.0904), Body Mass Index (BMI) (0.0721) and Fasting Blood Glucose (FBG)(0.0531). SHAP values show that DBP, Physical Inactivity, SBP, BMI, Smoking, FBG, and Triglyceride(TG) are positively correlated to the risk of getting stroke. High-density Lipoprotein (HDL) is negatively correlated to the risk of getting stroke. Combining with the data of 2000 hospitalized stroke patients, the logistic regression model shows that the average probabilities of getting stroke are 7.20%±0.55%44Confidence Interval with confidence level 95%. for the low-risk level patients, 19.02%±0.94% for the medium-risk level patients and 83.89%±0.97% for the high-risk level patients.
Based on the census data from Shanxi Province, we investigate stroke risk factors and their ranking. It shows that Hypertension, Physical Inactivity, and Ov |
---|---|
ISSN: | 2352-9148 2352-9148 |
DOI: | 10.1016/j.imu.2021.100712 |