A distributed approach for implementing multi-linear regression using gradient descent: Toward efficient cyber attacks detection algorithms

This study investigates the performance of multi-linear regression models trained using the batch gradient descent algorithm on the Hadoop streaming framework. The study compares the execution time and predictive accuracy achieved with multiple reducers versus a single reducer configuration. Experim...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Aljanabi, Muna H., Aljanabi, Kadhim B. S.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Accuracy Algorithms Configuration management Datasets Machine learning Malware Noise levels Noise prediction Performance prediction Regression analysis Regression models Root-mean-square errors Synthetic data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This study investigates the performance of multi-linear regression models trained using the batch gradient descent algorithm on the Hadoop streaming framework. The study compares the execution time and predictive accuracy achieved with multiple reducers versus a single reducer configuration. Experiments were conducted on a large-scale synthetic dataset designed to mimic real-world scenarios, particularly focusing on malware detection. The dataset incorporated features with varying degrees of correlation and noise levels. Results indicate that the configuration with multiple reducers outperforms the single reducer configuration in terms of both execution time and predictive accuracy. The multi-reducer configuration significantly reduces mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE), while achieving a higher coefficient of determination (R-squared). These improvements suggest better model performance and predictive accuracy with the multi-reducer setup. Practical implications and recommendations for optimizing Map Reduce jobs for training large-scale multi-linear regression models in the context of malware detection are discussed. Overall, this research enhances our understanding of distributed machine learning algorithms in big data environments, emphasizing the importance of Map Reduce configurations in achieving efficient and accurate training of regression models for malware detection.
ISSN:	0094-243X 1551-7616
DOI:	10.1063/5.0234364