An amalgam KNN to predict diabetes mellitus

Medical Data mining extracts hidden patterns from medical data. This paper presents the development of an amalgam model for classifying Pima Indian diabetic database (PIDD). This amalgam model combines k-means with k-Nearest Neighbor (KNN) with multi-sep preprocessing. Many researchers have found th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	NirmalaDevi, M., Appavu, Subramanian, Swathi, U. V.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Accuracy Amalgam KNN algorithm Classification algorithms Clustering algorithms Data mining Diabetes Diabetes Mellitus Disease Diseases k-means KNN algorithm Prediction algorithms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Medical Data mining extracts hidden patterns from medical data. This paper presents the development of an amalgam model for classifying Pima Indian diabetic database (PIDD). This amalgam model combines k-means with k-Nearest Neighbor (KNN) with multi-sep preprocessing. Many researchers have found that the KNN algorithm accomplishes very good performance in their experiments on different data sets. In this amalgam model, the quality of the data is improved by removing noisy data thereby helping to improve the accuracy and efficiency of the KNN algorithm.k-means clustering is used to identify and eliminate incorrectly classified instances. The missing values are replaced by means and medians. A fine tuned classification is done using k-Nearest Neighbor (KNN) by taking the correctly clustered instance with preprocessed subset as inputs for the KNN. The best choice of k depends upon the data. Generally, larger values of k reduce the effect of noise on the classification. A good k is selected by cross-validation technique. The aim of this paper is determining the value of k for PIDD for better classification accuracy using amalgam KNN. Experimental results signify the proposed amalgam KNN along with preprocessing produces best result for different k values. If k value is more the proposed model obtained the classification accuracy of 97.4%. Ten fold cross validation with larger k value produces better classification accuracy for PIDD. The results are also compared with simple KNN and cascaded K-MEANS and KNN for the same k values.
DOI:	10.1109/ICE-CCN.2013.6528591