Document Clustering

Similarity-based clustering documents to find patterns that characterize the data is one of the most important tasks in textual analytics applications. In the case of documents, clustering requires efficient approaches to represent and measure distances/closeness's between documents. From this,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Atkinson-Abutridy, John
Format: Buchkapitel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Similarity-based clustering documents to find patterns that characterize the data is one of the most important tasks in textual analytics applications. In the case of documents, clustering requires efficient approaches to represent and measure distances/closeness's between documents. From this, different cluster generation strategies can be used. One of the most popular strategies is the K-means clustering method, which fundamentally creates clusters based on the distance of the input data to the centers of the clusters, which is why groups are characterized by having concentric topologies. On the other hand, extensions of the technique, such as the Self-Organizational Map (SOM) allow not only to create clusters with different types of topology but to learn the best input data assignments to such clusters, considering the relationship with neighboring data points, which makes it attractive as a global optimum technique.
DOI:10.1201/9781003280996-7