Near Duplicate Document Detection Survey

Search engines are the major breakthrough on the web for retrieving the information. But List of retrieved documents contains a high percentage of duplicated and near document result. So there is the need to improve the performance of search results. Some of current search engine use data filtering...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer science & communication networks 2012-04, Vol.2 (2), p.147-147
Hauptverfasser:	Alsulami, Bassma S, Abulkhair, Maysoon F, Eassa, Fathy E
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Collection Communication networks Filtering Filtration Information retrieval Reproduction Search engines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Search engines are the major breakthrough on the web for retrieving the information. But List of retrieved documents contains a high percentage of duplicated and near document result. So there is the need to improve the performance of search results. Some of current search engine use data filtering algorithm which can eliminate duplicate and near duplicate documents to save the users' time and effort. The identification of similar or near-duplicate pairs in a large collection is a significant problem with wide-spread applications. In this paper survey present an up-to-date review of the existing literature in duplicate and near duplicate detection in Web.
ISSN:	2249-5789