Large Dataset Classification Using Parallel Processing Concept

Much attention has been paid to large data technologies in the past few years mainly due to its capability to impact business analytics and data mining practices, as well as the possibility of influencing an ambit of a highly effective decision-making tools. With the current increase in the number o...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	JOIV : international journal on informatics visualization Online 2020-12, Vol.4 (4), p.191-194
Hauptverfasser:	Aljanabi, Mohammad, Ebraheem, Hind Ra'ad, Hussain, Zahraa Faiz, Md Fudzee, Mohd Farhan, Kasim, Shahreen, Ismail, Mohd Arfian, Meidelfi, Dwiny, Erianda, Aldo
Format:	Artikel
Sprache:	eng
Schlagworte:	apache spark large dataset parallel svms pca
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Much attention has been paid to large data technologies in the past few years mainly due to its capability to impact business analytics and data mining practices, as well as the possibility of influencing an ambit of a highly effective decision-making tools. With the current increase in the number of modern applications (including social media and other web-based and healthcare applications) which generates high data in different forms and volume, the processing of such huge data volume is becoming a challenge with the conventional data processing tools. This has resulted in the emergence of big data analytics which also comes with many challenges. This paper introduced the use of principal components analysis (PCA) for data size reduction, followed by SVM parallelization. The proposed scheme in this study was executed on the Spark platform and the experimental findings revealed the capability of the proposed scheme to reduce the classifiersâ€™ classification time without much influence on the classification accuracy of the classifier.
ISSN:	2549-9610 2549-9904
DOI:	10.30630/joiv.4.4.361