Concept Chain Based Text Clustering
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic relations between words are ignored. In this paper, we propose a novel document representation approach to strengthen the d...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic relations between words are ignored. In this paper, we propose a novel document representation approach to strengthen the discriminative feature of document objects. We replace terms of documents with concepts in WordNet and construct a model named Concept CHain Model(CCHM) for document representation. CCHM is applied in both partitioning and agglomerative clustering analysis. Hierarchical clustering processes in different levels of concept chains. The experimental evaluation on textual data sets demonstrates the validity and efficiency of CCHM. The results of experiments with concept show the superiority of our approach in hierarchical clustering. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/11596448_105 |