An efficient predictive analytics system for high dimensional big data

The excessive growth of high dimensional big data has resulted in a greater challenge for data scientists to efficiently obtain valuable knowledge from these data. Traditional data mining techniques are not fit to process big data. Predictive analytics has grown in prominence alongside the emergence...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of King Saud University. Computer and information sciences 2022-01, Vol.34 (1), p.1521-1532
Hauptverfasser:	Oo, Myat Cho Mon, Thein, Thandar
Format:	Artikel
Sprache:	eng
Schlagworte:	Big data Dimension reduction High dimensionality Parameter optimization Predictive analytics Scalable random forest
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The excessive growth of high dimensional big data has resulted in a greater challenge for data scientists to efficiently obtain valuable knowledge from these data. Traditional data mining techniques are not fit to process big data. Predictive analytics has grown in prominence alongside the emergence of big data. In this paper, an efficient predictive analytics system for high dimensional big data is proposed by enhancing scalable random forest (SRF) algorithm on the Apache Spark platform. SRF is enhanced by optimizing the hyperparameters and prediction performance is improved by reducing the dimensions. The effectiveness of the proposed system is examined on five real-world datasets. Experimental results demonstrated that the proposed system achieves the highly competitive performance compared with RF algorithm implemented by Spark MLlib.
ISSN:	1319-1578 2213-1248
DOI:	10.1016/j.jksuci.2019.09.001