Finding the most similar documents across multiple text databases

We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Clement Yu, King-Lup Liu, Wensheng Wu, Weiyi Meng, Rishe, N.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Australia Computer networks Database systems Indexing Information retrieval Information systems Internet ISDN Machine learning Transaction databases
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies are presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies.
ISSN:	1092-9959 2378-7104
DOI:	10.1109/ADL.1999.777710