Method and apparatus for web crawler data collection
A method is provided for Web crawler data collection. The method includes the step of collecting information associated with a plurality of queries, the information related to results of the queries and/or responses to the queries. Estimates of return probabilities, clicking probabilities and incorr...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method is provided for Web crawler data collection. The method includes the step of collecting information associated with a plurality of queries, the information related to results of the queries and/or responses to the queries. Estimates of return probabilities, clicking probabilities and incorrect response probabilities are then calculated at least in part based on the collected information. The estimated return probabilitues relate to a probability that a search engine will return a particular Web page in a particular position of a particular query result page. The estimated clicking probabilities relate to a frequency with which a client selects a returned Web page in a particular position of a particular query result. The estimated incorrect response probabilities relate to the probability that a query to a stale version of a particular Web page yields an incorrect or vacuous response. According to another aspect of the invention, the method further includes collecting information about the characteristics and update time distributions of a plurality of Web pages. The characteristics may include whether a Web page update process is generalized stochastic or generalized deterministic in nature. |
---|