Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix

We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Deart, Vladimir, Mankov, Vladimir, Krasnova, Irina
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	agglomerative clustering distance matrix euclidean distance extremely randomized trees Load modeling machine learning manhattan distance Quality of service random forest Random forests random trees embedding Software defined networking Supervised learning traffic classification Vegetation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.
ISSN:	2305-7254 2305-7254 2343-0737
DOI:	10.23919/FRUCT50888.2021.9347616