Document Clustering based on Phrase and Single Term Similarity using Neo4j

Document similarity generally rely on single term similarity such as cosine similarity. To achieve better document similarity, along with single term phrase- more informative feature can be used. To find out shared phrases across the corpus the Document Index graph (DIG) representation model is used...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of innovative technology and exploring engineering 2020-01, Vol.9 (3), p.3188-3192
Hauptverfasser: Kathiria, Preeti, Arolkar, Harshal
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Document similarity generally rely on single term similarity such as cosine similarity. To achieve better document similarity, along with single term phrase- more informative feature can be used. To find out shared phrases across the corpus the Document Index graph (DIG) representation model is used. Document representation - DIG model incrementally construct the graph and simultaneously finds the shared phrase between current document and previously inserted documents from the graph. The similarity between documents is mainly depends on the number of shared phrases and single term similarity – known as hybrid similarity. The hybrid similarities are used with well- known density based clustering technique DBSCAN to assess their effect on quality of the clusters. Experimental results shows that hybrid similarity gives more accurate degree of document similarity and performs better cohesive clustering.
ISSN:2278-3075
2278-3075
DOI:10.35940/ijitee.C9050.019320