Techniques for the measurement of clustering tendency in document retrieval systems

The use of automatic classification techniques has been suggested as a means of increasing the effectiveness of docu ment retrieval systems; however, the automatic generation of a classification requires a large amount of computation, and it is thus of importance to know whether this computation wil...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of information science 1987-01, Vol.13 (6), p.361-365
Hauptverfasser:	El-Hamdouchi, Abdelmoula, Willett, Peter
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Cluster analysis Clustering Computerized information storage and retrieval Computerized subject indexing Content analysis Documents Exact sciences and technology Indexing Indexing. Classification. Abstracting. Syntheses Information and communication sciences Information and document structure and analysis Information processing and retrieval Information retrieval Information science. Documentation Information storage and retrieval Information work Sciences and techniques of general use Searches Specialized information sources Staff relations Subject indexing Technical services Terms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The use of automatic classification techniques has been suggested as a means of increasing the effectiveness of docu ment retrieval systems; however, the automatic generation of a classification requires a large amount of computation, and it is thus of importance to know whether this computation will result in material increases in retrieval performance. This paper describes three methods - the overlap test, the nearest neighbour test and the density test - which can be used to measure the degree of clustering tendency in a set of docu ments. It is shown that the three tests are not in complete agreement with each other in their evaluation of the degree of clustering tendency present in seven document test collections. A comparison of the predicted degree of clustering tendency with the relative effectiveness of cluster and non-cluster searches suggests that the density test gives the most useful results; it also has the advantage that it does not require query and relevance data and can thus be used in a predictive manner when a document collection is to be processed for the first time.
ISSN:	0165-5515 1741-6485
DOI:	10.1177/016555158701300607