Method and system of ranking and clustering for document indexing and retrieval

The relevancy ranking and clustering method and system for document indexing and retrieval of the present invention is intended to provide mechanisms for an information retrieval system to rank documents based on relevance to a query and in accordance with user feedback. A user can make queries in t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Caudill, Maureen, Tseng, Jason Chun-Ming, Wang, Lei
Format:	Patent
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The relevancy ranking and clustering method and system for document indexing and retrieval of the present invention is intended to provide mechanisms for an information retrieval system to rank documents based on relevance to a query and in accordance with user feedback. A user can make queries in the form of natural language, keywords or predicates. Queries are converted into ontology-based predicate structures and compared against documents, which have been previously parsed for their predicates, to obtain the best possible matching documents, which are then presented to the user. The present method and system is designed to automate judgments about which documents are the best possible matches to a query within a given index. The system is further designed to allow users to provide feedback in order to fine-tune the automated judgment procedure. A relevancy ranking and clustering method and system that determines the relevance of a document relative to a user's query using a similarity comparison process. Input queries are parsed into one or more query predicate structures using an ontological parser. The ontological parser parses a set of known documents to generate one or more document predicate structures. A comparison of each query predicate structure with each document predicate structure is performed to determine a matching degree, represented by a real number. A multilevel modifier strategy is implemented to assign different relevance values to the different parts of each predicate structure match to calculate the predicate structure's matching degree. The relevance of a document to a user's query is determined by calculating a similarity coefficient, based on the structures of each pair of query predicates and document predicates. Documents are autonomously clustered using a self-organizing neural network that provides a coordinate system that makes judgments in a non-subjective fashion.