Novel Feature Reduction (NFR) Model With Machine Learning and Data Mining Algorithms for Effective Disease Risk Prediction

Presently, the application of machine learning (ML) and data mining (DM) techniques have a vital role in healthcare systems and wisely convert all obtainable data into beneficial knowledge. It is proven from the literature works that a chance of 12% error remains in the diagnosis of the diseases by...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.184087-184108
Hauptverfasser: Pasha, Syed Javeed, Mohamed, E. Syed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Presently, the application of machine learning (ML) and data mining (DM) techniques have a vital role in healthcare systems and wisely convert all obtainable data into beneficial knowledge. It is proven from the literature works that a chance of 12% error remains in the diagnosis of the diseases by the medical practitioners. Moreover, for effective disease risk prediction in medical analysis, more emphasis is accorded to the area under the curve (AUC) with accuracy as an evaluation metric. However, the role of the AUC has not been previously characterized notably. In this research article, a novel feature reduction (NFR) model that is aligned with the ML and DM algorithms is proposed to reduce the error rate and further improve the performance. The proposed NFR model comprises of two approaches and uses the AUC in addition to the accuracy to achieve a robust and effective disease risk prediction. The first approach is based on a heuristic process evaluating performance by reducing features with respect to the improvement in the AUC besides the accuracy as evaluation metrics, working to obtain the best subset of highly contributing features in the prediction. The second approach evaluates the accuracy and AUC of all individual features and forms the subsets with the highest accuracies, AUCs, and least difference between them, which are combined in various combinations to achieve the best-reduced set of highly relevant features. For this purpose, the benchmarked public heart datasets of the ML repository of the University of California, Irvine (UCI) are tested; the results are promising. The highest accuracy and AUC achieved with the proposed NFR model are 95.52% and 99.20% with 41.67% feature reduction, respectively. The accuracy is 4.22% higher than recent existing research with a significant improvement of 25% in the performance of the running time of the algorithm.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3028714