Topic Discovery from Text Using Aggregation of Different Clustering Methods

Cluster analysis is an un-supervised learning technique that is widely used in the process of topic discovery from text. The research presented here proposes a novel un-supervised learning approach based on aggregation of clusterings produced by different clustering techniques. By examining and comb...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ayad, Hanan, Kamel, Mohamed
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cluster analysis is an un-supervised learning technique that is widely used in the process of topic discovery from text. The research presented here proposes a novel un-supervised learning approach based on aggregation of clusterings produced by different clustering techniques. By examining and combining two different clusterings of a document collection, the aggregation aims at revealing a better structure of the data rather than imposing one that is imposed or constrained by the clustering method itself. When clusters of documents are formed, a process called topic extraction picks terms from the feature space (i.e. the vocabulary of the whole collection) to describe the topic of each cluster. It is proposed at this stage to re-compute terms weights according to the revealed cluster structure. The work further investigates the adaptive setup of the parameters required for the clustering and aggregation techniques. Finally, a topic accuracy measure is developed and used along with the F-measure to evaluate and compare the extracted topics and the clustering quality (respectively) before and after the aggregation. Experimental evaluation shows that the aggregation can successfully improve the clustering quality and the topic accuracy over individual clustering techniques.
ISSN:0302-9743
1611-3349
DOI:10.1007/3-540-47922-8_14