Latent Ontological Feature Discovery for Text Clustering

The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features, namely, their aliases, types, and identifiers, which are hidden...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Duong, V.T.T., Cao, T.H., Chau, C.K., Quan, T.T.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Cities and towns Clustering algorithms Computer science Entropy Information retrieval Information technology Labeling Mathematics Ontologies Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features, namely, their aliases, types, and identifiers, which are hidden from their textual appearance. In this paper, we explore weighted combinations of those latent named entity features with keywords for text clustering. To that end, the traditional vector space model is adapted with multiple vectors defined over spaces of entity names, types, name-type pairs, identifiers, and keywords. Clustering quality is evaluated by both of the self purity-separation type and the relative comparison type of measures. Hard and fuzzy clustering experiments of the proposed model on selected data subsets of Reuters-21578 are conducted and evaluated.
DOI:	10.1109/RIVF.2009.5174647