Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data

Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the dat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2018-01, Vol.6, p.11897-11906
Hauptverfasser: Peng, Kai, Leung, Victor C. M., Huang, Qingjia
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 11906
container_issue
container_start_page 11897
container_title IEEE access
container_volume 6
creator Peng, Kai
Leung, Victor C. M.
Huang, Qingjia
description Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on K -means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch K -means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch K -means method is used for data clustering. More specifically, we use K -means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.
doi_str_mv 10.1109/ACCESS.2018.2810267
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8304564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8304564</ieee_id><doaj_id>oai_doaj_org_article_58cb4c80511f45a89deaba4cc815b4d2</doaj_id><sourcerecordid>2455860684</sourcerecordid><originalsourceid>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</originalsourceid><addsrcrecordid>eNpNUU1PwzAMrRBIIOAXcInEeSNf7tLjKF8TIA6DE4coTZ2RaWtHkiHt35NRhPAl9rPfc-RXFBeMjhmj1dW0rm_n8zGnTI25YpSXk4PihLOyGgkQ5eG__Lg4j3FJc6gMweSkeK9X25gw-G5BpptN6I39INcmYkv6jjz7zucqZexxjaaLxPWBzLoUttHn_g0mtGmfzXdZZU1evjCQa78gNyaZs-LImVXE89_3tHi7u32tH0ZPL_ezevo0ssBlGoGEsgQ6qSwY1TDqLAdJkTveKoemEpxBJapKSLSSlRk3E-laCQ26tgUQp8Vs0G17s9Sb4Ncm7HRvvP4B-rDQJiRvV6hB2UZaRYExJ_O6qkXTGGmtYtDIlmety0Ern-JzizHpZb8NXf6-5hJAlbRUMk-JYcqGPsaA7m8ro3pvih5M0XtT9K8pmXUxsDwi_jGUoPkAUnwDjxqG5Q</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455860684</pqid></control><display><type>article</type><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</creator><creatorcontrib>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</creatorcontrib><description><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2018.2810267</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Big Data ; Classification algorithms ; Cloud computing ; Clustering ; Clustering algorithms ; Clustering methods ; Data mining ; Datasets ; Efficiency ; Evaluation ; IDS ; Intrusion detection ; Intrusion detection systems ; mini batch Kmeans ; Principal component analysis ; Principal components analysis ; Social networks</subject><ispartof>IEEE access, 2018-01, Vol.6, p.11897-11906</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</citedby><cites>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</cites><orcidid>0000-0001-9148-7616</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8304564$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,778,782,862,2098,27620,27911,27912,54920</link.rule.ids></links><search><creatorcontrib>Peng, Kai</creatorcontrib><creatorcontrib>Leung, Victor C. M.</creatorcontrib><creatorcontrib>Huang, Qingjia</creatorcontrib><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><title>IEEE access</title><addtitle>Access</addtitle><description><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></description><subject>Algorithms</subject><subject>Big Data</subject><subject>Classification algorithms</subject><subject>Cloud computing</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Efficiency</subject><subject>Evaluation</subject><subject>IDS</subject><subject>Intrusion detection</subject><subject>Intrusion detection systems</subject><subject>mini batch Kmeans</subject><subject>Principal component analysis</subject><subject>Principal components analysis</subject><subject>Social networks</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1PwzAMrRBIIOAXcInEeSNf7tLjKF8TIA6DE4coTZ2RaWtHkiHt35NRhPAl9rPfc-RXFBeMjhmj1dW0rm_n8zGnTI25YpSXk4PihLOyGgkQ5eG__Lg4j3FJc6gMweSkeK9X25gw-G5BpptN6I39INcmYkv6jjz7zucqZexxjaaLxPWBzLoUttHn_g0mtGmfzXdZZU1evjCQa78gNyaZs-LImVXE89_3tHi7u32tH0ZPL_ezevo0ssBlGoGEsgQ6qSwY1TDqLAdJkTveKoemEpxBJapKSLSSlRk3E-laCQ26tgUQp8Vs0G17s9Sb4Ncm7HRvvP4B-rDQJiRvV6hB2UZaRYExJ_O6qkXTGGmtYtDIlmety0Ern-JzizHpZb8NXf6-5hJAlbRUMk-JYcqGPsaA7m8ro3pvih5M0XtT9K8pmXUxsDwi_jGUoPkAUnwDjxqG5Q</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Peng, Kai</creator><creator>Leung, Victor C. M.</creator><creator>Huang, Qingjia</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9148-7616</orcidid></search><sort><creationdate>20180101</creationdate><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><author>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Big Data</topic><topic>Classification algorithms</topic><topic>Cloud computing</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Efficiency</topic><topic>Evaluation</topic><topic>IDS</topic><topic>Intrusion detection</topic><topic>Intrusion detection systems</topic><topic>mini batch Kmeans</topic><topic>Principal component analysis</topic><topic>Principal components analysis</topic><topic>Social networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Kai</creatorcontrib><creatorcontrib>Leung, Victor C. M.</creatorcontrib><creatorcontrib>Huang, Qingjia</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Peng, Kai</au><au>Leung, Victor C. M.</au><au>Huang, Qingjia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>6</volume><spage>11897</spage><epage>11906</epage><pages>11897-11906</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2018.2810267</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-9148-7616</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2018-01, Vol.6, p.11897-11906
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_8304564
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Big Data
Classification algorithms
Cloud computing
Clustering
Clustering algorithms
Clustering methods
Data mining
Datasets
Efficiency
Evaluation
IDS
Intrusion detection
Intrusion detection systems
mini batch Kmeans
Principal component analysis
Principal components analysis
Social networks
title Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T03%3A49%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20Approach%20Based%20on%20Mini%20Batch%20Kmeans%20for%20Intrusion%20Detection%20System%20Over%20Big%20Data&rft.jtitle=IEEE%20access&rft.au=Peng,%20Kai&rft.date=2018-01-01&rft.volume=6&rft.spage=11897&rft.epage=11906&rft.pages=11897-11906&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2018.2810267&rft_dat=%3Cproquest_ieee_%3E2455860684%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455860684&rft_id=info:pmid/&rft_ieee_id=8304564&rft_doaj_id=oai_doaj_org_article_58cb4c80511f45a89deaba4cc815b4d2&rfr_iscdi=true