Method for learning character patterns to interactively control the scope of a web crawler
This invention relates generally to Web crawlers, and more particularly to learning character patterns in queries to control the scope of Web crawler searches for Web pages. A method controls a Web search for server computer resources by an end-user Web crawler. Each resource, such as a Web page, is...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This invention relates generally to Web crawlers, and more particularly to learning character patterns in queries to control the scope of Web crawler searches for Web pages.
A method controls a Web search for server computer resources by an end-user Web crawler. Each resource, such as a Web page, is located by a resource address specified as a character string. The end-user defines a scope for an initial Web search by settings. The settings are used to search the Web for resources limited by the scope. The set of resources located during the search are rendered on output device, and positive and negative examples are selected from the set of resources to infer a rule. The rule is displayed, as well as a subset of resources that match on the rule. The selecting, inferring, and rendering steps are repeated while searching until a final rule is obtained. The rule matches resources that the crawler should process and does not match resource that it should avoid. |
---|