Efficient high dimension data clustering using constraint-partitioning K-means algorithm

with the ever increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has rece...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International arab journal of information technology 2013-11, Vol.10 (6)
1. Verfasser:	Jurj, Aloysius
Format:	Artikel
Sprache:	eng
Schlagworte:	Data mining Data structures (Computer science) التنقيب في البيانات
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	6
container_start_page
container_title	International arab journal of information technology
container_volume	10
creator	Jurj, Aloysius
description	with the ever increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of high dimension dataset using Constraint-Partitioning K-Means clustering algorithm which did not fit properly to cluster high dimensional data sets in terms of effectiveness and efficiency, because of the intrinsic sparse of high dimensional data and resulted in producing indefinite and inaccurate clusters. Hence, we carry out two steps for clustering high dimension dataset. Initially, we perform dimensionality reduction on the high dimension dataset using Principal Component Analysis as a preprocessing step to data clustering. Later, we integrate the Constraint-Partitioning KMeans clustering algorithm to the dimension reduced dataset to produce good and accurate clusters. The performance of the approach is evaluated with high dimensional datasets such as Parkinson’s dataset and Ionosphere dataset. The experimental results showed that the proposed approach is very effective in producing accurate and precise clusters.
format	Article
fullrecord	<record><control><sourceid>emarefa</sourceid><recordid>TN_cdi_emarefa_primary_311839</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>311839</sourcerecordid><originalsourceid>FETCH-LOGICAL-e92t-e466812119c6b9cd303b3ff2944a97e4b89f8e498cc7af02b73e369c62481b0c3</originalsourceid><addsrcrecordid>eNpNjM1qwzAQhEVpoSHNIxT0AgJLK2TpWEL6QwO95NBbWMsrW8VWgqQc-vZ1aA-dw8zHMMwNW0ljQYB09vYf37NNKV_NInDKtO2Kfe5CiD5SqnyMw8j7OFMq8ZR4jxW5ny6lUo5p4JdydX9KpWaMqYoz5hrrMr3272ImTIXjNJxyrOP8wO4CToU2f7lmh-fdYfsq9h8vb9unvSCnqiBtjJVKSudN53wPDXQQgnJao2tJd9YFS9pZ71sMjepaIDDLWGkru8bDmj3-3tKMmQIezzku9H0EKS04-AFDVE5N</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Efficient high dimension data clustering using constraint-partitioning K-means algorithm</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Jurj, Aloysius</creator><creatorcontrib>Jurj, Aloysius</creatorcontrib><description>with the ever increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of high dimension dataset using Constraint-Partitioning K-Means clustering algorithm which did not fit properly to cluster high dimensional data sets in terms of effectiveness and efficiency, because of the intrinsic sparse of high dimensional data and resulted in producing indefinite and inaccurate clusters. Hence, we carry out two steps for clustering high dimension dataset. Initially, we perform dimensionality reduction on the high dimension dataset using Principal Component Analysis as a preprocessing step to data clustering. Later, we integrate the Constraint-Partitioning KMeans clustering algorithm to the dimension reduced dataset to produce good and accurate clusters. The performance of the approach is evaluated with high dimensional datasets such as Parkinson’s dataset and Ionosphere dataset. The experimental results showed that the proposed approach is very effective in producing accurate and precise clusters.</description><identifier>ISSN: 1683-3198</identifier><identifier>EISSN: 1683-3198</identifier><language>eng</language><publisher>Zarqa, Jordan: Zarqa University</publisher><subject>Data mining ; Data structures (Computer science) ; التنقيب في البيانات</subject><ispartof>International arab journal of information technology, 2013-11, Vol.10 (6)</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784</link.rule.ids></links><search><creatorcontrib>Jurj, Aloysius</creatorcontrib><title>Efficient high dimension data clustering using constraint-partitioning K-means algorithm</title><title>International arab journal of information technology</title><description>with the ever increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of high dimension dataset using Constraint-Partitioning K-Means clustering algorithm which did not fit properly to cluster high dimensional data sets in terms of effectiveness and efficiency, because of the intrinsic sparse of high dimensional data and resulted in producing indefinite and inaccurate clusters. Hence, we carry out two steps for clustering high dimension dataset. Initially, we perform dimensionality reduction on the high dimension dataset using Principal Component Analysis as a preprocessing step to data clustering. Later, we integrate the Constraint-Partitioning KMeans clustering algorithm to the dimension reduced dataset to produce good and accurate clusters. The performance of the approach is evaluated with high dimensional datasets such as Parkinson’s dataset and Ionosphere dataset. The experimental results showed that the proposed approach is very effective in producing accurate and precise clusters.</description><subject>Data mining</subject><subject>Data structures (Computer science)</subject><subject>التنقيب في البيانات</subject><issn>1683-3198</issn><issn>1683-3198</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNpNjM1qwzAQhEVpoSHNIxT0AgJLK2TpWEL6QwO95NBbWMsrW8VWgqQc-vZ1aA-dw8zHMMwNW0ljQYB09vYf37NNKV_NInDKtO2Kfe5CiD5SqnyMw8j7OFMq8ZR4jxW5ny6lUo5p4JdydX9KpWaMqYoz5hrrMr3272ImTIXjNJxyrOP8wO4CToU2f7lmh-fdYfsq9h8vb9unvSCnqiBtjJVKSudN53wPDXQQgnJao2tJd9YFS9pZ71sMjepaIDDLWGkru8bDmj3-3tKMmQIezzku9H0EKS04-AFDVE5N</recordid><startdate>201311</startdate><enddate>201311</enddate><creator>Jurj, Aloysius</creator><general>Zarqa University</general><scope>ADJCN</scope><scope>AGZBS</scope><scope>AHFXO</scope></search><sort><creationdate>201311</creationdate><title>Efficient high dimension data clustering using constraint-partitioning K-means algorithm</title><author>Jurj, Aloysius</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-e92t-e466812119c6b9cd303b3ff2944a97e4b89f8e498cc7af02b73e369c62481b0c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Data mining</topic><topic>Data structures (Computer science)</topic><topic>التنقيب في البيانات</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jurj, Aloysius</creatorcontrib><collection>الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals</collection><collection>قاعدة العلوم الاجتماعية - e-Marefa Social Sciences</collection><collection>معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete</collection><jtitle>International arab journal of information technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jurj, Aloysius</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficient high dimension data clustering using constraint-partitioning K-means algorithm</atitle><jtitle>International arab journal of information technology</jtitle><date>2013-11</date><risdate>2013</risdate><volume>10</volume><issue>6</issue><issn>1683-3198</issn><eissn>1683-3198</eissn><abstract>with the ever increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of high dimension dataset using Constraint-Partitioning K-Means clustering algorithm which did not fit properly to cluster high dimensional data sets in terms of effectiveness and efficiency, because of the intrinsic sparse of high dimensional data and resulted in producing indefinite and inaccurate clusters. Hence, we carry out two steps for clustering high dimension dataset. Initially, we perform dimensionality reduction on the high dimension dataset using Principal Component Analysis as a preprocessing step to data clustering. Later, we integrate the Constraint-Partitioning KMeans clustering algorithm to the dimension reduced dataset to produce good and accurate clusters. The performance of the approach is evaluated with high dimensional datasets such as Parkinson’s dataset and Ionosphere dataset. The experimental results showed that the proposed approach is very effective in producing accurate and precise clusters.</abstract><cop>Zarqa, Jordan</cop><pub>Zarqa University</pub><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1683-3198
ispartof	International arab journal of information technology, 2013-11, Vol.10 (6)
issn	1683-3198 1683-3198
language	eng
recordid	cdi_emarefa_primary_311839
source	EZB-FREE-00999 freely available EZB journals
subjects	Data mining Data structures (Computer science) التنقيب في البيانات
title	Efficient high dimension data clustering using constraint-partitioning K-means algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T00%3A24%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-emarefa&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficient%20high%20dimension%20data%20clustering%20using%20constraint-partitioning%20K-means%20algorithm&rft.jtitle=International%20arab%20journal%20of%20information%20technology&rft.au=Jurj,%20Aloysius&rft.date=2013-11&rft.volume=10&rft.issue=6&rft.issn=1683-3198&rft.eissn=1683-3198&rft_id=info:doi/&rft_dat=%3Cemarefa%3E311839%3C/emarefa%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true