A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering

•The study describes the essential phases of Arabic text clustering.•Bond energy algorithm for text document clustering is presented.•Fuzzy merge algorithm is explored to improve clustering.•Several clustering algorithms are compared and evaluated on Arabic datasets. Conventional textual documents c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2020-11, Vol.159, p.113598, Article 113598
Hauptverfasser: AlMahmoud, Rana Husni, Hammo, Bassam, Faris, Hossam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 113598
container_title Expert systems with applications
container_volume 159
creator AlMahmoud, Rana Husni
Hammo, Bassam
Faris, Hossam
description •The study describes the essential phases of Arabic text clustering.•Bond energy algorithm for text document clustering is presented.•Fuzzy merge algorithm is explored to improve clustering.•Several clustering algorithms are compared and evaluated on Arabic datasets. Conventional textual documents clustering algorithms suffer from several shortcomings, such as the slow convergence of the immense high-dimensional data, the sensitivity to the initial value, and the understandability of the description of the resulted clusters. Although many clustering algorithms have been developed for English and other languages, very few have tackled the problem of clustering the under-resourced Arabic language. In this work, we propose a modified version of the Bond Energy Algorithm (BEA) combined with a fuzzy merging technique to solve the problem of Arabic text document clustering. The proposed algorithm, Clustering Arabic Documents based on Bond Energy, hereafter named CADBE, attempts to identify and display natural variable clusters within huge sized data. CADBE has three steps to cluster Arabic documents: the first step instantiates a cluster affinity matrix using the BEA, the second step uses a new and novel method to partition the cluster matrix automatically into small coherent clusters, and the last step uses a fuzzy merging technique to merge similar clusters based on the associations and interrelations between the resulted clusters. Experimental results showed that the proposed algorithm effectively outperformed the conventional clustering algorithms such as Expectation–Maximization (EM), Single Linkage, and UPGMA in terms of clustering purity and entropy. It also outperformed k-means, k-means++, spherical k-means, and CoclusMod in most test cases. However, there are several merits of CADBE. First, unlike the traditional clustering algorithms, it does not require to specify the number of clusters. In addition, it produces clusters with distinct boundaries, which makes its results more objective, and finally it is deterministic, such that it is insensitive to the order in which documents are presented to the algorithm.
doi_str_mv 10.1016/j.eswa.2020.113598
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2454517446</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S095741742030422X</els_id><sourcerecordid>2454517446</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-6a79106b6c36ea0457e06ccc1feceda99e64959454b5942c79100cccfc369f593</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AU8Bz12TfiQb8LIsfsGCFz2HNJ2uKW1Tk9R199ebUs9eJjDzPJPhReiWkhUllN03K_AHtUpJGhs0K8T6DC3ommcJ4yI7RwsiCp7klOeX6Mr7hhDKCeEL1GxwZytTG6hwafsKQw9uf8Sq3VtnwmeHD7HiejydjriLI9PvsYqcCR6rYWiNVsHYHgeLN06VRuMAPwFXVo8d9AHrdvQBXNSu0UWtWg83f-8SfTw9vm9fkt3b8-t2s0t0lq5DwhQXlLCS6YyBInnBgTCtNa1BQ6WEAJaLQuRFXsaa6okmcV5HXtSFyJbobt47OPs1gg-ysaPr45cyjVYRQ8hZpNKZ0s5676CWgzOdckdJiZwylY2cMpVTpnLONEoPswTx_m8DTnptoI93GQc6yMqa__Rf-IiBHw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2454517446</pqid></control><display><type>article</type><title>A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>AlMahmoud, Rana Husni ; Hammo, Bassam ; Faris, Hossam</creator><creatorcontrib>AlMahmoud, Rana Husni ; Hammo, Bassam ; Faris, Hossam</creatorcontrib><description>•The study describes the essential phases of Arabic text clustering.•Bond energy algorithm for text document clustering is presented.•Fuzzy merge algorithm is explored to improve clustering.•Several clustering algorithms are compared and evaluated on Arabic datasets. Conventional textual documents clustering algorithms suffer from several shortcomings, such as the slow convergence of the immense high-dimensional data, the sensitivity to the initial value, and the understandability of the description of the resulted clusters. Although many clustering algorithms have been developed for English and other languages, very few have tackled the problem of clustering the under-resourced Arabic language. In this work, we propose a modified version of the Bond Energy Algorithm (BEA) combined with a fuzzy merging technique to solve the problem of Arabic text document clustering. The proposed algorithm, Clustering Arabic Documents based on Bond Energy, hereafter named CADBE, attempts to identify and display natural variable clusters within huge sized data. CADBE has three steps to cluster Arabic documents: the first step instantiates a cluster affinity matrix using the BEA, the second step uses a new and novel method to partition the cluster matrix automatically into small coherent clusters, and the last step uses a fuzzy merging technique to merge similar clusters based on the associations and interrelations between the resulted clusters. Experimental results showed that the proposed algorithm effectively outperformed the conventional clustering algorithms such as Expectation–Maximization (EM), Single Linkage, and UPGMA in terms of clustering purity and entropy. It also outperformed k-means, k-means++, spherical k-means, and CoclusMod in most test cases. However, there are several merits of CADBE. First, unlike the traditional clustering algorithms, it does not require to specify the number of clusters. In addition, it produces clusters with distinct boundaries, which makes its results more objective, and finally it is deterministic, such that it is insensitive to the order in which documents are presented to the algorithm.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2020.113598</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Arabic text document clustering ; Bond energy ; Bond energy algorithm ; Clustering ; Fuzzy Merging</subject><ispartof>Expert systems with applications, 2020-11, Vol.159, p.113598, Article 113598</ispartof><rights>2020 Elsevier Ltd</rights><rights>Copyright Elsevier BV Nov 30, 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-6a79106b6c36ea0457e06ccc1feceda99e64959454b5942c79100cccfc369f593</citedby><cites>FETCH-LOGICAL-c328t-6a79106b6c36ea0457e06ccc1feceda99e64959454b5942c79100cccfc369f593</cites><orcidid>0000-0003-4261-8127 ; 0000-0002-5270-7409</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2020.113598$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>AlMahmoud, Rana Husni</creatorcontrib><creatorcontrib>Hammo, Bassam</creatorcontrib><creatorcontrib>Faris, Hossam</creatorcontrib><title>A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering</title><title>Expert systems with applications</title><description>•The study describes the essential phases of Arabic text clustering.•Bond energy algorithm for text document clustering is presented.•Fuzzy merge algorithm is explored to improve clustering.•Several clustering algorithms are compared and evaluated on Arabic datasets. Conventional textual documents clustering algorithms suffer from several shortcomings, such as the slow convergence of the immense high-dimensional data, the sensitivity to the initial value, and the understandability of the description of the resulted clusters. Although many clustering algorithms have been developed for English and other languages, very few have tackled the problem of clustering the under-resourced Arabic language. In this work, we propose a modified version of the Bond Energy Algorithm (BEA) combined with a fuzzy merging technique to solve the problem of Arabic text document clustering. The proposed algorithm, Clustering Arabic Documents based on Bond Energy, hereafter named CADBE, attempts to identify and display natural variable clusters within huge sized data. CADBE has three steps to cluster Arabic documents: the first step instantiates a cluster affinity matrix using the BEA, the second step uses a new and novel method to partition the cluster matrix automatically into small coherent clusters, and the last step uses a fuzzy merging technique to merge similar clusters based on the associations and interrelations between the resulted clusters. Experimental results showed that the proposed algorithm effectively outperformed the conventional clustering algorithms such as Expectation–Maximization (EM), Single Linkage, and UPGMA in terms of clustering purity and entropy. It also outperformed k-means, k-means++, spherical k-means, and CoclusMod in most test cases. However, there are several merits of CADBE. First, unlike the traditional clustering algorithms, it does not require to specify the number of clusters. In addition, it produces clusters with distinct boundaries, which makes its results more objective, and finally it is deterministic, such that it is insensitive to the order in which documents are presented to the algorithm.</description><subject>Algorithms</subject><subject>Arabic text document clustering</subject><subject>Bond energy</subject><subject>Bond energy algorithm</subject><subject>Clustering</subject><subject>Fuzzy Merging</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AU8Bz12TfiQb8LIsfsGCFz2HNJ2uKW1Tk9R199ebUs9eJjDzPJPhReiWkhUllN03K_AHtUpJGhs0K8T6DC3ommcJ4yI7RwsiCp7klOeX6Mr7hhDKCeEL1GxwZytTG6hwafsKQw9uf8Sq3VtnwmeHD7HiejydjriLI9PvsYqcCR6rYWiNVsHYHgeLN06VRuMAPwFXVo8d9AHrdvQBXNSu0UWtWg83f-8SfTw9vm9fkt3b8-t2s0t0lq5DwhQXlLCS6YyBInnBgTCtNa1BQ6WEAJaLQuRFXsaa6okmcV5HXtSFyJbobt47OPs1gg-ysaPr45cyjVYRQ8hZpNKZ0s5676CWgzOdckdJiZwylY2cMpVTpnLONEoPswTx_m8DTnptoI93GQc6yMqa__Rf-IiBHw</recordid><startdate>20201130</startdate><enddate>20201130</enddate><creator>AlMahmoud, Rana Husni</creator><creator>Hammo, Bassam</creator><creator>Faris, Hossam</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4261-8127</orcidid><orcidid>https://orcid.org/0000-0002-5270-7409</orcidid></search><sort><creationdate>20201130</creationdate><title>A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering</title><author>AlMahmoud, Rana Husni ; Hammo, Bassam ; Faris, Hossam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-6a79106b6c36ea0457e06ccc1feceda99e64959454b5942c79100cccfc369f593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Arabic text document clustering</topic><topic>Bond energy</topic><topic>Bond energy algorithm</topic><topic>Clustering</topic><topic>Fuzzy Merging</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>AlMahmoud, Rana Husni</creatorcontrib><creatorcontrib>Hammo, Bassam</creatorcontrib><creatorcontrib>Faris, Hossam</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>AlMahmoud, Rana Husni</au><au>Hammo, Bassam</au><au>Faris, Hossam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering</atitle><jtitle>Expert systems with applications</jtitle><date>2020-11-30</date><risdate>2020</risdate><volume>159</volume><spage>113598</spage><pages>113598-</pages><artnum>113598</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•The study describes the essential phases of Arabic text clustering.•Bond energy algorithm for text document clustering is presented.•Fuzzy merge algorithm is explored to improve clustering.•Several clustering algorithms are compared and evaluated on Arabic datasets. Conventional textual documents clustering algorithms suffer from several shortcomings, such as the slow convergence of the immense high-dimensional data, the sensitivity to the initial value, and the understandability of the description of the resulted clusters. Although many clustering algorithms have been developed for English and other languages, very few have tackled the problem of clustering the under-resourced Arabic language. In this work, we propose a modified version of the Bond Energy Algorithm (BEA) combined with a fuzzy merging technique to solve the problem of Arabic text document clustering. The proposed algorithm, Clustering Arabic Documents based on Bond Energy, hereafter named CADBE, attempts to identify and display natural variable clusters within huge sized data. CADBE has three steps to cluster Arabic documents: the first step instantiates a cluster affinity matrix using the BEA, the second step uses a new and novel method to partition the cluster matrix automatically into small coherent clusters, and the last step uses a fuzzy merging technique to merge similar clusters based on the associations and interrelations between the resulted clusters. Experimental results showed that the proposed algorithm effectively outperformed the conventional clustering algorithms such as Expectation–Maximization (EM), Single Linkage, and UPGMA in terms of clustering purity and entropy. It also outperformed k-means, k-means++, spherical k-means, and CoclusMod in most test cases. However, there are several merits of CADBE. First, unlike the traditional clustering algorithms, it does not require to specify the number of clusters. In addition, it produces clusters with distinct boundaries, which makes its results more objective, and finally it is deterministic, such that it is insensitive to the order in which documents are presented to the algorithm.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2020.113598</doi><orcidid>https://orcid.org/0000-0003-4261-8127</orcidid><orcidid>https://orcid.org/0000-0002-5270-7409</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2020-11, Vol.159, p.113598, Article 113598
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_journals_2454517446
source ScienceDirect Journals (5 years ago - present)
subjects Algorithms
Arabic text document clustering
Bond energy
Bond energy algorithm
Clustering
Fuzzy Merging
title A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T08%3A20%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20modified%20bond%20energy%20algorithm%20with%20fuzzy%20merging%20and%20its%20application%20to%20Arabic%20text%20document%20clustering&rft.jtitle=Expert%20systems%20with%20applications&rft.au=AlMahmoud,%20Rana%20Husni&rft.date=2020-11-30&rft.volume=159&rft.spage=113598&rft.pages=113598-&rft.artnum=113598&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2020.113598&rft_dat=%3Cproquest_cross%3E2454517446%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2454517446&rft_id=info:pmid/&rft_els_id=S095741742030422X&rfr_iscdi=true