Newsgroup topic extraction using term-cluster weighting and Pillar K-Means clustering

Topic extraction is an essential tool to help gathering information from a vast amount of sources. This paper introduces a new approach to extract topics from a collection of text documents. In order to obtain the topics, preprocessing steps are conducted to remove unnecessary parts of the documents...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computers & applications 2022-04, Vol.44 (4), p.357-364
Hauptverfasser: Adinugroho, Sigit, Wihandika, Randy C., Adikara, Putra P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Topic extraction is an essential tool to help gathering information from a vast amount of sources. This paper introduces a new approach to extract topics from a collection of text documents. In order to obtain the topics, preprocessing steps are conducted to remove unnecessary parts of the documents. Then, a term frequency-inverse document frequency is built to weight terms in documents. After that, SVD-based feature transformation is involved in building features used for clustering. Prior to clustering process, the Pillar algorithm is run to select initial centroids for K-Means clustering. Finally, weights of terms in clusters are calculated using term-cluster weight as a basis to choose topics from clusters. Based on the experimental result, it is concluded that the framework achieves satisfactory results by attaining the accuracy of 100%, 95.1%, 83,7%, and 68.7% for 4 topics obtained from Binary2, Multi5, Multi7, and Multi10 categories of 20Newsgroup dataset.
ISSN:1206-212X
1925-7074
DOI:10.1080/1206212X.2020.1757246