Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix

We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Deart, Vladimir, Mankov, Vladimir, Krasnova, Irina
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 88
container_issue 1
container_start_page 81
container_title
container_volume 28
creator Deart, Vladimir
Mankov, Vladimir
Krasnova, Irina
description We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.
doi_str_mv 10.23919/FRUCT50888.2021.9347616
format Conference Proceeding
fullrecord <record><control><sourceid>doaj_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9347616</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9347616</ieee_id><doaj_id>oai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701</doaj_id><sourcerecordid>oai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701</sourcerecordid><originalsourceid>FETCH-LOGICAL-d269t-13de7ab52e8cc36ab4785d9d83d6f0f032112ce596bac5ef10c8f36f948a0d183</originalsourceid><addsrcrecordid>eNpN0NtKAzEQBuAoCh6fwJu8QGsOm2xyWeupUBWkertMk0mb2jYlG09v72pFnJsZBv6PYQihnPWFtNyeXz8-DSeKGWP6ggnet7KqNdc75Mgqoa2outolh0Iy1auFqvb-zQfktG0XjDFhlLa2PiSLwWy2TCvMUOIb0uHytS2Y43pGU6D3WN5TfqGTDCFERy-gRU_Tmj5Djum1pYPNJidwc2xpSfQSu-gqrr_TZY70MrYF1g7pHZQcP07IfoBli6e__Zg8XV9Nhre98cPNaDgY93x3fulx6bGGqRJonJMaplVtlLfeSK8DC0wKzoVDZfUUnMLAmTNB6mArA8xzI4_JaOv6BItmk-MK8meTIDY_i5RnDeQS3RIbGWonWEdw7ivtAyivtLKBK-RVzXhnnW2tiIh_1u_L5RfEi3VM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</title><source>DOAJ Directory of Open Access Journals</source><creator>Deart, Vladimir ; Mankov, Vladimir ; Krasnova, Irina</creator><creatorcontrib>Deart, Vladimir ; Mankov, Vladimir ; Krasnova, Irina</creatorcontrib><description>We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.</description><identifier>ISSN: 2305-7254</identifier><identifier>EISSN: 2305-7254</identifier><identifier>EISSN: 2343-0737</identifier><identifier>EISBN: 9526924444</identifier><identifier>EISBN: 9789526924441</identifier><identifier>DOI: 10.23919/FRUCT50888.2021.9347616</identifier><language>eng</language><publisher>FRUCT</publisher><subject>agglomerative clustering ; distance matrix ; euclidean distance ; extremely randomized trees ; Load modeling ; machine learning ; manhattan distance ; Quality of service ; random forest ; Random forests ; random trees embedding ; Software defined networking ; Supervised learning ; traffic classification ; Vegetation</subject><ispartof>2021 28th Conference of Open Innovations Association (FRUCT), 2021, Vol.28 (1), p.81-88</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,780,784,789,790,864,2102,23930,23931,25140,27924,27925</link.rule.ids></links><search><creatorcontrib>Deart, Vladimir</creatorcontrib><creatorcontrib>Mankov, Vladimir</creatorcontrib><creatorcontrib>Krasnova, Irina</creatorcontrib><title>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</title><title>2021 28th Conference of Open Innovations Association (FRUCT)</title><addtitle>FRUCT</addtitle><description>We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.</description><subject>agglomerative clustering</subject><subject>distance matrix</subject><subject>euclidean distance</subject><subject>extremely randomized trees</subject><subject>Load modeling</subject><subject>machine learning</subject><subject>manhattan distance</subject><subject>Quality of service</subject><subject>random forest</subject><subject>Random forests</subject><subject>random trees embedding</subject><subject>Software defined networking</subject><subject>Supervised learning</subject><subject>traffic classification</subject><subject>Vegetation</subject><issn>2305-7254</issn><issn>2305-7254</issn><issn>2343-0737</issn><isbn>9526924444</isbn><isbn>9789526924441</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpN0NtKAzEQBuAoCh6fwJu8QGsOm2xyWeupUBWkertMk0mb2jYlG09v72pFnJsZBv6PYQihnPWFtNyeXz8-DSeKGWP6ggnet7KqNdc75Mgqoa2outolh0Iy1auFqvb-zQfktG0XjDFhlLa2PiSLwWy2TCvMUOIb0uHytS2Y43pGU6D3WN5TfqGTDCFERy-gRU_Tmj5Djum1pYPNJidwc2xpSfQSu-gqrr_TZY70MrYF1g7pHZQcP07IfoBli6e__Zg8XV9Nhre98cPNaDgY93x3fulx6bGGqRJonJMaplVtlLfeSK8DC0wKzoVDZfUUnMLAmTNB6mArA8xzI4_JaOv6BItmk-MK8meTIDY_i5RnDeQS3RIbGWonWEdw7ivtAyivtLKBK-RVzXhnnW2tiIh_1u_L5RfEi3VM</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Deart, Vladimir</creator><creator>Mankov, Vladimir</creator><creator>Krasnova, Irina</creator><general>FRUCT</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>DOA</scope></search><sort><creationdate>20210101</creationdate><title>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</title><author>Deart, Vladimir ; Mankov, Vladimir ; Krasnova, Irina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d269t-13de7ab52e8cc36ab4785d9d83d6f0f032112ce596bac5ef10c8f36f948a0d183</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><topic>agglomerative clustering</topic><topic>distance matrix</topic><topic>euclidean distance</topic><topic>extremely randomized trees</topic><topic>Load modeling</topic><topic>machine learning</topic><topic>manhattan distance</topic><topic>Quality of service</topic><topic>random forest</topic><topic>Random forests</topic><topic>random trees embedding</topic><topic>Software defined networking</topic><topic>Supervised learning</topic><topic>traffic classification</topic><topic>Vegetation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deart, Vladimir</creatorcontrib><creatorcontrib>Mankov, Vladimir</creatorcontrib><creatorcontrib>Krasnova, Irina</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>DOAJ Directory of Open Access Journals</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deart, Vladimir</au><au>Mankov, Vladimir</au><au>Krasnova, Irina</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix</atitle><btitle>2021 28th Conference of Open Innovations Association (FRUCT)</btitle><stitle>FRUCT</stitle><date>2021-01-01</date><risdate>2021</risdate><volume>28</volume><issue>1</issue><spage>81</spage><epage>88</epage><pages>81-88</pages><issn>2305-7254</issn><eissn>2305-7254</eissn><eissn>2343-0737</eissn><eisbn>9526924444</eisbn><eisbn>9789526924441</eisbn><abstract>We are presenting a real-time traffic flow classification model for maintaining QoS in dynamic networks such as Software Defined Networks (SDN). In previous works, we managed to achieve high accuracy (90-95%) on the database of flows known for the model using Machine Learning (Supervised Learning) methods but in a dynamic SDN new network applications and flows appear more often than usual. For detection of new flows it is proposed to use the Agglomerative clustering method, which has never been used to solve the problem of network flow classification, because early approaches to traffic clustering gave insufficient results and the speed of its operation was too low. This paper offers a combination of different Machine Learning methods in such a way that Agglomerative clustering is responsible only for updating the class database, and Supervised Learning methods are responsible for quickly classifying known flows, which solves the problem of model speed. Clustering accuracy is improved by automatically controlling the cluster construction process by determining the distances between flows using the Random Forest and Extra Trees methods. In the experimental part of the study, three more most promising ways of determining distances are given for comparison: Random Trees Embedding, Euclidean and Manhattan distance. Results of clustering of TCP and UDP applications for different number of clusters and different size of the initial sample are presented. Experimental studies confirm the effectiveness of using hierarchical clustering in traffic clustering tasks under the condition of controlled cluster construction.</abstract><pub>FRUCT</pub><doi>10.23919/FRUCT50888.2021.9347616</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2305-7254
ispartof 2021 28th Conference of Open Innovations Association (FRUCT), 2021, Vol.28 (1), p.81-88
issn 2305-7254
2305-7254
2343-0737
language eng
recordid cdi_ieee_primary_9347616
source DOAJ Directory of Open Access Journals
subjects agglomerative clustering
distance matrix
euclidean distance
extremely randomized trees
Load modeling
machine learning
manhattan distance
Quality of service
random forest
Random forests
random trees embedding
Software defined networking
Supervised learning
traffic classification
Vegetation
title Agglomerative Clustering of Network Traffic Based on Various Approaches to Determining the Distance Matrix
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T06%3A19%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-doaj_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Agglomerative%20Clustering%20of%20Network%20Traffic%20Based%20on%20Various%20Approaches%20to%20Determining%20the%20Distance%20Matrix&rft.btitle=2021%2028th%20Conference%20of%20Open%20Innovations%20Association%20(FRUCT)&rft.au=Deart,%20Vladimir&rft.date=2021-01-01&rft.volume=28&rft.issue=1&rft.spage=81&rft.epage=88&rft.pages=81-88&rft.issn=2305-7254&rft.eissn=2305-7254&rft_id=info:doi/10.23919/FRUCT50888.2021.9347616&rft_dat=%3Cdoaj_ieee_%3Eoai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701%3C/doaj_ieee_%3E%3Curl%3E%3C/url%3E&rft.eisbn=9526924444&rft.eisbn_list=9789526924441&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9347616&rft_doaj_id=oai_doaj_org_article_3f7c20f1011d46dfa5d5659f15e14701&rfr_iscdi=true