Clustering texts using feature similarity based AHC algorithm
This article proposes the modified AHC (Agglomerative Hierarchical Clustering) algorithm which considers the feature similarity and is applied to the text clustering. The words which are given as features for encoding texts into numerical vectors are semantic related entities, rather than independen...
Gespeichert in:
Veröffentlicht in: | Journal of intelligent & fuzzy systems 2018-01, Vol.35 (6), p.5993-6003 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This article proposes the modified AHC (Agglomerative Hierarchical Clustering) algorithm which considers the feature similarity and is applied to the text clustering. The words which are given as features for encoding texts into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word clustering and the text clustering is expected by combining both of them with each other. In this research, we define the similarity metric between numerical vectors considering the feature similarity, and modify the AHC algorithm by adopting the proposed similarity metric as the approach to the text clustering. The proposed AHC algorithm is empirically validated as the better approach in clustering texts in news articles and opinions. The significance of this research is to improve the clustering performance by utilizing the feature similarities. |
---|---|
ISSN: | 1064-1246 1875-8967 |
DOI: | 10.3233/JIFS-169840 |