Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data
Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the dat...
Gespeichert in:
Veröffentlicht in: | IEEE access 2018-01, Vol.6, p.11897-11906 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 11906 |
---|---|
container_issue | |
container_start_page | 11897 |
container_title | IEEE access |
container_volume | 6 |
creator | Peng, Kai Leung, Victor C. M. Huang, Qingjia |
description | Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on K -means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch K -means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch K -means method is used for data clustering. More specifically, we use K -means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment. |
doi_str_mv | 10.1109/ACCESS.2018.2810267 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8304564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8304564</ieee_id><doaj_id>oai_doaj_org_article_58cb4c80511f45a89deaba4cc815b4d2</doaj_id><sourcerecordid>2455860684</sourcerecordid><originalsourceid>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</originalsourceid><addsrcrecordid>eNpNUU1PwzAMrRBIIOAXcInEeSNf7tLjKF8TIA6DE4coTZ2RaWtHkiHt35NRhPAl9rPfc-RXFBeMjhmj1dW0rm_n8zGnTI25YpSXk4PihLOyGgkQ5eG__Lg4j3FJc6gMweSkeK9X25gw-G5BpptN6I39INcmYkv6jjz7zucqZexxjaaLxPWBzLoUttHn_g0mtGmfzXdZZU1evjCQa78gNyaZs-LImVXE89_3tHi7u32tH0ZPL_ezevo0ssBlGoGEsgQ6qSwY1TDqLAdJkTveKoemEpxBJapKSLSSlRk3E-laCQ26tgUQp8Vs0G17s9Sb4Ncm7HRvvP4B-rDQJiRvV6hB2UZaRYExJ_O6qkXTGGmtYtDIlmety0Ern-JzizHpZb8NXf6-5hJAlbRUMk-JYcqGPsaA7m8ro3pvih5M0XtT9K8pmXUxsDwi_jGUoPkAUnwDjxqG5Q</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455860684</pqid></control><display><type>article</type><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</creator><creatorcontrib>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</creatorcontrib><description><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2018.2810267</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Big Data ; Classification algorithms ; Cloud computing ; Clustering ; Clustering algorithms ; Clustering methods ; Data mining ; Datasets ; Efficiency ; Evaluation ; IDS ; Intrusion detection ; Intrusion detection systems ; mini batch Kmeans ; Principal component analysis ; Principal components analysis ; Social networks</subject><ispartof>IEEE access, 2018-01, Vol.6, p.11897-11906</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</citedby><cites>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</cites><orcidid>0000-0001-9148-7616</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8304564$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,778,782,862,2098,27620,27911,27912,54920</link.rule.ids></links><search><creatorcontrib>Peng, Kai</creatorcontrib><creatorcontrib>Leung, Victor C. M.</creatorcontrib><creatorcontrib>Huang, Qingjia</creatorcontrib><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><title>IEEE access</title><addtitle>Access</addtitle><description><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></description><subject>Algorithms</subject><subject>Big Data</subject><subject>Classification algorithms</subject><subject>Cloud computing</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Efficiency</subject><subject>Evaluation</subject><subject>IDS</subject><subject>Intrusion detection</subject><subject>Intrusion detection systems</subject><subject>mini batch Kmeans</subject><subject>Principal component analysis</subject><subject>Principal components analysis</subject><subject>Social networks</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1PwzAMrRBIIOAXcInEeSNf7tLjKF8TIA6DE4coTZ2RaWtHkiHt35NRhPAl9rPfc-RXFBeMjhmj1dW0rm_n8zGnTI25YpSXk4PihLOyGgkQ5eG__Lg4j3FJc6gMweSkeK9X25gw-G5BpptN6I39INcmYkv6jjz7zucqZexxjaaLxPWBzLoUttHn_g0mtGmfzXdZZU1evjCQa78gNyaZs-LImVXE89_3tHi7u32tH0ZPL_ezevo0ssBlGoGEsgQ6qSwY1TDqLAdJkTveKoemEpxBJapKSLSSlRk3E-laCQ26tgUQp8Vs0G17s9Sb4Ncm7HRvvP4B-rDQJiRvV6hB2UZaRYExJ_O6qkXTGGmtYtDIlmety0Ern-JzizHpZb8NXf6-5hJAlbRUMk-JYcqGPsaA7m8ro3pvih5M0XtT9K8pmXUxsDwi_jGUoPkAUnwDjxqG5Q</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Peng, Kai</creator><creator>Leung, Victor C. M.</creator><creator>Huang, Qingjia</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9148-7616</orcidid></search><sort><creationdate>20180101</creationdate><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><author>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Big Data</topic><topic>Classification algorithms</topic><topic>Cloud computing</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Efficiency</topic><topic>Evaluation</topic><topic>IDS</topic><topic>Intrusion detection</topic><topic>Intrusion detection systems</topic><topic>mini batch Kmeans</topic><topic>Principal component analysis</topic><topic>Principal components analysis</topic><topic>Social networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Kai</creatorcontrib><creatorcontrib>Leung, Victor C. M.</creatorcontrib><creatorcontrib>Huang, Qingjia</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Peng, Kai</au><au>Leung, Victor C. M.</au><au>Huang, Qingjia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>6</volume><spage>11897</spage><epage>11906</epage><pages>11897-11906</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2018.2810267</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-9148-7616</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2018-01, Vol.6, p.11897-11906 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_ieee_primary_8304564 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Algorithms Big Data Classification algorithms Cloud computing Clustering Clustering algorithms Clustering methods Data mining Datasets Efficiency Evaluation IDS Intrusion detection Intrusion detection systems mini batch Kmeans Principal component analysis Principal components analysis Social networks |
title | Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T03%3A49%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20Approach%20Based%20on%20Mini%20Batch%20Kmeans%20for%20Intrusion%20Detection%20System%20Over%20Big%20Data&rft.jtitle=IEEE%20access&rft.au=Peng,%20Kai&rft.date=2018-01-01&rft.volume=6&rft.spage=11897&rft.epage=11906&rft.pages=11897-11906&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2018.2810267&rft_dat=%3Cproquest_ieee_%3E2455860684%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455860684&rft_id=info:pmid/&rft_ieee_id=8304564&rft_doaj_id=oai_doaj_org_article_58cb4c80511f45a89deaba4cc815b4d2&rfr_iscdi=true |