A Data-Driven Heart Disease Prediction Model Through K-Means Clustering-Based Anomaly Detection

Heart disease, alternatively known as cardiovascular disease, is the primary basis of death worldwide over the past few decades. To make an early diagnosis, a data-driven prediction model considering the associate risk factors in heart disease can play a significant role in healthcare domain. Howeve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SN computer science 2021-04, Vol.2 (2), p.112, Article 112
Hauptverfasser:	Ripan, Rony Chowdhury, Sarker, Iqbal H., Hossain, Syed Md. Minhaz, Anwar, Md. Musfique, Nowrozy, Raza, Hoque, Mohammed Moshiul, Furhad, Md. Hasan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Anomalies Cardiovascular disease Classification Cluster analysis Clustering Computer Imaging Computer Science Computer Systems Organization and Communication Networks Data analysis Data mining Data Structures and Information Theory Datasets Health care Heart Heart diseases Information Systems and Communication Service Machine learning Neural networks Original Research Outliers (statistics) Pattern Recognition and Graphics Prediction models Regression analysis Resultants Sensors Software Engineering/Programming and Operating Systems Support vector machines Vector quantization Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Heart disease, alternatively known as cardiovascular disease, is the primary basis of death worldwide over the past few decades. To make an early diagnosis, a data-driven prediction model considering the associate risk factors in heart disease can play a significant role in healthcare domain. However, to build such an effective model based on machine learning techniques, the quality of the data , e.g., data without “anomalies” or outliers, is important. This research investigates anomaly detection in the healthcare domain to effectively predict heart disease using unsupervised K-means clustering algorithm. Our proposed model first determines an optimal value of K using the Silhouette method to form the clusters for finding the anomalies. After that, we eliminate the identified anomalies from the data and employ the five most popular machine learning classification techniques, such as K -nearest neighbor, random forest, support vector machine, naive Bayes, and logistic regression to build the resultant prediction model. The efficacy of the proposed methodology is justified using a standard heart disease dataset. We also take into account the data plotting to test the exactness of the detection of anomalies in our experimental analysis.
ISSN:	2662-995X 2661-8907
DOI:	10.1007/s42979-021-00518-7