Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data

Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the dat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2018-01, Vol.6, p.11897-11906
Hauptverfasser:	Peng, Kai, Leung, Victor C. M., Huang, Qingjia
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Big Data Classification algorithms Cloud computing Clustering Clustering algorithms Clustering methods Data mining Datasets Efficiency Evaluation IDS Intrusion detection Intrusion detection systems mini batch Kmeans Principal component analysis Principal components analysis Social networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	11906
container_issue
container_start_page	11897
container_title	IEEE access
container_volume	6
creator	Peng, Kai Leung, Victor C. M. Huang, Qingjia
description	Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on K -means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch K -means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch K -means method is used for data clustering. More specifically, we use K -means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.
doi_str_mv	10.1109/ACCESS.2018.2810267
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8304564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8304564</ieee_id><doaj_id>oai_doaj_org_article_58cb4c80511f45a89deaba4cc815b4d2</doaj_id><sourcerecordid>2455860684</sourcerecordid><originalsourceid>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</originalsourceid><addsrcrecordid>eNpNUU1PwzAMrRBIIOAXcInEeSNf7tLjKF8TIA6DE4coTZ2RaWtHkiHt35NRhPAl9rPfc-RXFBeMjhmj1dW0rm_n8zGnTI25YpSXk4PihLOyGgkQ5eG__Lg4j3FJc6gMweSkeK9X25gw-G5BpptN6I39INcmYkv6jjz7zucqZexxjaaLxPWBzLoUttHn_g0mtGmfzXdZZU1evjCQa78gNyaZs-LImVXE89_3tHi7u32tH0ZPL_ezevo0ssBlGoGEsgQ6qSwY1TDqLAdJkTveKoemEpxBJapKSLSSlRk3E-laCQ26tgUQp8Vs0G17s9Sb4Ncm7HRvvP4B-rDQJiRvV6hB2UZaRYExJ_O6qkXTGGmtYtDIlmety0Ern-JzizHpZb8NXf6-5hJAlbRUMk-JYcqGPsaA7m8ro3pvih5M0XtT9K8pmXUxsDwi_jGUoPkAUnwDjxqG5Q</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455860684</pqid></control><display><type>article</type><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</creator><creatorcontrib>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</creatorcontrib><description><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2018.2810267</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Big Data ; Classification algorithms ; Cloud computing ; Clustering ; Clustering algorithms ; Clustering methods ; Data mining ; Datasets ; Efficiency ; Evaluation ; IDS ; Intrusion detection ; Intrusion detection systems ; mini batch Kmeans ; Principal component analysis ; Principal components analysis ; Social networks</subject><ispartof>IEEE access, 2018-01, Vol.6, p.11897-11906</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</citedby><cites>FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</cites><orcidid>0000-0001-9148-7616</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8304564$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,778,782,862,2098,27620,27911,27912,54920</link.rule.ids></links><search><creatorcontrib>Peng, Kai</creatorcontrib><creatorcontrib>Leung, Victor C. M.</creatorcontrib><creatorcontrib>Huang, Qingjia</creatorcontrib><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><title>IEEE access</title><addtitle>Access</addtitle><description><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></description><subject>Algorithms</subject><subject>Big Data</subject><subject>Classification algorithms</subject><subject>Cloud computing</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clustering methods</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Efficiency</subject><subject>Evaluation</subject><subject>IDS</subject><subject>Intrusion detection</subject><subject>Intrusion detection systems</subject><subject>mini batch Kmeans</subject><subject>Principal component analysis</subject><subject>Principal components analysis</subject><subject>Social networks</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1PwzAMrRBIIOAXcInEeSNf7tLjKF8TIA6DE4coTZ2RaWtHkiHt35NRhPAl9rPfc-RXFBeMjhmj1dW0rm_n8zGnTI25YpSXk4PihLOyGgkQ5eG__Lg4j3FJc6gMweSkeK9X25gw-G5BpptN6I39INcmYkv6jjz7zucqZexxjaaLxPWBzLoUttHn_g0mtGmfzXdZZU1evjCQa78gNyaZs-LImVXE89_3tHi7u32tH0ZPL_ezevo0ssBlGoGEsgQ6qSwY1TDqLAdJkTveKoemEpxBJapKSLSSlRk3E-laCQ26tgUQp8Vs0G17s9Sb4Ncm7HRvvP4B-rDQJiRvV6hB2UZaRYExJ_O6qkXTGGmtYtDIlmety0Ern-JzizHpZb8NXf6-5hJAlbRUMk-JYcqGPsaA7m8ro3pvih5M0XtT9K8pmXUxsDwi_jGUoPkAUnwDjxqG5Q</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Peng, Kai</creator><creator>Leung, Victor C. M.</creator><creator>Huang, Qingjia</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9148-7616</orcidid></search><sort><creationdate>20180101</creationdate><title>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</title><author>Peng, Kai ; Leung, Victor C. M. ; Huang, Qingjia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c524t-545665079c5a8b10fc2540e2f2d8fea93215939934ec4162f2a74fd45befdd553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Big Data</topic><topic>Classification algorithms</topic><topic>Cloud computing</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clustering methods</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Efficiency</topic><topic>Evaluation</topic><topic>IDS</topic><topic>Intrusion detection</topic><topic>Intrusion detection systems</topic><topic>mini batch Kmeans</topic><topic>Principal component analysis</topic><topic>Principal components analysis</topic><topic>Social networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Kai</creatorcontrib><creatorcontrib>Leung, Victor C. M.</creatorcontrib><creatorcontrib>Huang, Qingjia</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Peng, Kai</au><au>Leung, Victor C. M.</au><au>Huang, Qingjia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>6</volume><spage>11897</spage><epage>11906</epage><pages>11897-11906</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract><![CDATA[Intrusion detection system (IDS) provides an important basis for the network defense. Due to the development of the cloud computing and social network, massive amounts of data are generated, which inevitably brings much pressure to IDS. And therefore, it becomes crucial to efficiently divide the data into different classes over big data according to data features. Moreover, we can further determine whether one is normal behavior or not based on the classes information. Although the clustering approach based on <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means for IDS has been well studied, unfortunately directly using it in big data environment may suffer from inappropriateness. On the one hand, the efficiency of data clustering needs to be improved. On the other hand, differ from the classification, there is no unified evaluation indicator for clustering issue, and thus, it is necessary to study which indicator is more suitable for evaluating the clustering results of IDS. In this paper, we propose a clustering method for IDS based on Mini Batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means combined with principal component analysis. First, a preprocessing method is proposed to digitize the strings and then the data set is normalized so as to improve the clustering efficiency. Second, the principal component analysis method is used to reduce the dimension of the processed data set aiming to further improve the clustering efficiency, and then mini batch <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means method is used for data clustering. More specifically, we use <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula>-means++ to initialize the centers of cluster in order to avoid the algorithm getting into the local optimum, in addition, we choose the Calsski Harabasz indicator so that the clustering result is more easily determined. Compared with the other methods, the experimental results and the time complexity analysis show that our proposed method is effective and efficient. Above all, our proposed clustering method can be used for IDS over big data environment.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2018.2810267</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0001-9148-7616</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2018-01, Vol.6, p.11897-11906
issn	2169-3536 2169-3536
language	eng
recordid	cdi_ieee_primary_8304564
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Algorithms Big Data Classification algorithms Cloud computing Clustering Clustering algorithms Clustering methods Data mining Datasets Efficiency Evaluation IDS Intrusion detection Intrusion detection systems mini batch Kmeans Principal component analysis Principal components analysis Social networks
title	Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T03%3A49%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20Approach%20Based%20on%20Mini%20Batch%20Kmeans%20for%20Intrusion%20Detection%20System%20Over%20Big%20Data&rft.jtitle=IEEE%20access&rft.au=Peng,%20Kai&rft.date=2018-01-01&rft.volume=6&rft.spage=11897&rft.epage=11906&rft.pages=11897-11906&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2018.2810267&rft_dat=%3Cproquest_ieee_%3E2455860684%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2455860684&rft_id=info:pmid/&rft_ieee_id=8304564&rft_doaj_id=oai_doaj_org_article_58cb4c80511f45a89deaba4cc815b4d2&rfr_iscdi=true