Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix

We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Deart, Vladimir, Mankov, Vladimir, Krasnova, Irina
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	agglomerative clustering distance matrix euclidean distance extremely randomized trees Load modeling machine learning manhattan distance Quality of service random forest Random forests random trees embedding Software defined networking Supervised learning traffic classification Vegetation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	88
container_issue	1
container_start_page	81
container_title
container_volume	28
creator	Deart, Vladimir Mankov, Vladimir Krasnova, Irina
description	We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.
doi_str_mv	10.23919/FRUCT50888.2021.9347616
format	Conference Proceeding
fullrecord	<record><control><sourceid>doaj_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9347616</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9347616</ieee_id><doaj_id>oai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701</doaj_id><sourcerecordid>oai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701</sourcerecordid><originalsourceid>FETCH-LOGICAL-d269t-13de7ab52e8cc36ab4785d9d83d6f0f032112ce596bac5ef10c8f36f948a0d183</originalsourceid><addsrcrecordid>eNpN0NtKAzEQBuAoCh6fwJu8QGsOm2xyWeupUBWkertMk0mb2jYlG09v72pFnJsZBv6PYQihnPWFtNyeXz8-DSeKGWP6ggnet7KqNdc75Mgqoa2outolh0Iy1auFqvb-zQfktG0XjDFhlLa2PiSLwWy2TCvMUOIb0uHytS2Y43pGU6D3WN5TfqGTDCFERy-gRU_Tmj5Djum1pYPNJidwc2xpSfQSu-gqrr_TZY70MrYF1g7pHZQcP07IfoBli6e__Zg8XV9Nhre98cPNaDgY93x3fulx6bGGqRJonJMaplVtlLfeSK8DC0wKzoVDZfUUnMLAmTNB6mArA8xzI4_JaOv6BItmk-MK8meTIDY_i5RnDeQS3RIbGWonWEdw7ivtAyivtLKBK-RVzXhnnW2tiIh_1u_L5RfEi3VM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</title><source>DOAJ Directory of Open Access Journals</source><creator>Deart, Vladimir ; Mankov, Vladimir ; Krasnova, Irina</creator><creatorcontrib>Deart, Vladimir ; Mankov, Vladimir ; Krasnova, Irina</creatorcontrib><description>We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.</description><identifier>ISSN: 2305-7254</identifier><identifier>EISSN: 2305-7254</identifier><identifier>EISSN: 2343-0737</identifier><identifier>EISBN: 9526924444</identifier><identifier>EISBN: 9789526924441</identifier><identifier>DOI: 10.23919/FRUCT50888.2021.9347616</identifier><language>eng</language><publisher>FRUCT</publisher><subject>agglomerative clustering ; distance matrix ; euclidean distance ; extremely randomized trees ; Load modeling ; machine learning ; manhattan distance ; Quality of service ; random forest ; Random forests ; random trees embedding ; Software defined networking ; Supervised learning ; traffic classification ; Vegetation</subject><ispartof>2021 28th Conference of Open Innovations Association (FRUCT), 2021, Vol.28 (1), p.81-88</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,780,784,789,790,864,2102,23930,23931,25140,27924,27925</link.rule.ids></links><search><creatorcontrib>Deart, Vladimir</creatorcontrib><creatorcontrib>Mankov, Vladimir</creatorcontrib><creatorcontrib>Krasnova, Irina</creatorcontrib><title>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</title><title>2021 28th Conference of Open Innovations Association (FRUCT)</title><addtitle>FRUCT</addtitle><description>We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.</description><subject>agglomerative clustering</subject><subject>distance matrix</subject><subject>euclidean distance</subject><subject>extremely randomized trees</subject><subject>Load modeling</subject><subject>machine learning</subject><subject>manhattan distance</subject><subject>Quality of service</subject><subject>random forest</subject><subject>Random forests</subject><subject>random trees embedding</subject><subject>Software defined networking</subject><subject>Supervised learning</subject><subject>traffic classification</subject><subject>Vegetation</subject><issn>2305-7254</issn><issn>2305-7254</issn><issn>2343-0737</issn><isbn>9526924444</isbn><isbn>9789526924441</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpN0NtKAzEQBuAoCh6fwJu8QGsOm2xyWeupUBWkertMk0mb2jYlG09v72pFnJsZBv6PYQihnPWFtNyeXz8-DSeKGWP6ggnet7KqNdc75Mgqoa2outolh0Iy1auFqvb-zQfktG0XjDFhlLa2PiSLwWy2TCvMUOIb0uHytS2Y43pGU6D3WN5TfqGTDCFERy-gRU_Tmj5Djum1pYPNJidwc2xpSfQSu-gqrr_TZY70MrYF1g7pHZQcP07IfoBli6e__Zg8XV9Nhre98cPNaDgY93x3fulx6bGGqRJonJMaplVtlLfeSK8DC0wKzoVDZfUUnMLAmTNB6mArA8xzI4_JaOv6BItmk-MK8meTIDY_i5RnDeQS3RIbGWonWEdw7ivtAyivtLKBK-RVzXhnnW2tiIh_1u_L5RfEi3VM</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Deart, Vladimir</creator><creator>Mankov, Vladimir</creator><creator>Krasnova, Irina</creator><general>FRUCT</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>DOA</scope></search><sort><creationdate>20210101</creationdate><title>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</title><author>Deart, Vladimir ; Mankov, Vladimir ; Krasnova, Irina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d269t-13de7ab52e8cc36ab4785d9d83d6f0f032112ce596bac5ef10c8f36f948a0d183</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><topic>agglomerative clustering</topic><topic>distance matrix</topic><topic>euclidean distance</topic><topic>extremely randomized trees</topic><topic>Load modeling</topic><topic>machine learning</topic><topic>manhattan distance</topic><topic>Quality of service</topic><topic>random forest</topic><topic>Random forests</topic><topic>random trees embedding</topic><topic>Software defined networking</topic><topic>Supervised learning</topic><topic>traffic classification</topic><topic>Vegetation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deart, Vladimir</creatorcontrib><creatorcontrib>Mankov, Vladimir</creatorcontrib><creatorcontrib>Krasnova, Irina</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>DOAJ Directory of Open Access Journals</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deart, Vladimir</au><au>Mankov, Vladimir</au><au>Krasnova, Irina</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</atitle><btitle>2021 28th Conference of Open Innovations Association (FRUCT)</btitle><stitle>FRUCT</stitle><date>2021-01-01</date><risdate>2021</risdate><volume>28</volume><issue>1</issue><spage>81</spage><epage>88</epage><pages>81-88</pages><issn>2305-7254</issn><eissn>2305-7254</eissn><eissn>2343-0737</eissn><eisbn>9526924444</eisbn><eisbn>9789526924441</eisbn><abstract>We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.</abstract><pub>FRUCT</pub><doi>10.23919/FRUCT50888.2021.9347616</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2305-7254
ispartof	2021 28th Conference of Open Innovations Association (FRUCT), 2021, Vol.28 (1), p.81-88
issn	2305-7254 2305-7254 2343-0737
language	eng
recordid	cdi_ieee_primary_9347616
source	DOAJ Directory of Open Access Journals
subjects	agglomerative clustering distance matrix euclidean distance extremely randomized trees Load modeling machine learning manhattan distance Quality of service random forest Random forests random trees embedding Software defined networking Supervised learning traffic classification Vegetation
title	Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T06%3A19%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-doaj_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Agglomerative%20Clustering%20of%20Network%20Traffic%20Based%20on%20Various%20Approaches%20to%20Determining%20the%20Distance%20Matrix&rft.btitle=2021%2028th%20Conference%20of%20Open%20Innovations%20Association%20(FRUCT)&rft.au=Deart,%20Vladimir&rft.date=2021-01-01&rft.volume=28&rft.issue=1&rft.spage=81&rft.epage=88&rft.pages=81-88&rft.issn=2305-7254&rft.eissn=2305-7254&rft_id=info:doi/10.23919/FRUCT50888.2021.9347616&rft_dat=%3Cdoaj_ieee_%3Eoai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701%3C/doaj_ieee_%3E%3Curl%3E%3C/url%3E&rft.eisbn=9526924444&rft.eisbn_list=9789526924441&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9347616&rft_doaj_id=oai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701&rfr_iscdi=true