Newsgroup topic extraction using term-cluster weighting and Pillar K-Means clustering
Topic extraction is an essential tool to help gathering information from a vast amount of sources. This paper introduces a new approach to extract topics from a collection of text documents. In order to obtain the topics, preprocessing steps are conducted to remove unnecessary parts of the documents...
Gespeichert in:
Veröffentlicht in: | International journal of computers & applications 2022-04, Vol.44 (4), p.357-364 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Topic extraction is an essential tool to help gathering information from a vast amount of sources. This paper introduces a new approach to extract topics from a collection of text documents. In order to obtain the topics, preprocessing steps are conducted to remove unnecessary parts of the documents. Then, a term frequency-inverse document frequency is built to weight terms in documents. After that, SVD-based feature transformation is involved in building features used for clustering. Prior to clustering process, the Pillar algorithm is run to select initial centroids for K-Means clustering. Finally, weights of terms in clusters are calculated using term-cluster weight as a basis to choose topics from clusters. Based on the experimental result, it is concluded that the framework achieves satisfactory results by attaining the accuracy of 100%, 95.1%, 83,7%, and 68.7% for 4 topics obtained from Binary2, Multi5, Multi7, and Multi10 categories of 20Newsgroup dataset. |
---|---|
ISSN: | 1206-212X 1925-7074 |
DOI: | 10.1080/1206212X.2020.1757246 |