Spatially directed crawling of documents
A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; s...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue along with a spatial relevance level for each stored address; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue. |
---|