Method for learning character patterns to interactively control the scope of a web crawler

This invention relates generally to Web crawlers, and more particularly to learning character patterns in queries to control the scope of Web crawler searches for Web pages. A method controls a Web search for server computer resources by an end-user Web crawler. Each resource, such as a Web page, is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bharat, Krishna Asur, Miller, Robert Chisolm
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This invention relates generally to Web crawlers, and more particularly to learning character patterns in queries to control the scope of Web crawler searches for Web pages. A method controls a Web search for server computer resources by an end-user Web crawler. Each resource, such as a Web page, is located by a resource address specified as a character string. The end-user defines a scope for an initial Web search by settings. The settings are used to search the Web for resources limited by the scope. The set of resources located during the search are rendered on output device, and positive and negative examples are selected from the set of resources to infer a rule. The rule is displayed, as well as a subset of resources that match on the rule. The selecting, inferring, and rendering steps are repeated while searching until a final rule is obtained. The rule matches resources that the crawler should process and does not match resource that it should avoid.