Spatially directed crawling of documents

A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Frank, John R, Donoghue, Karen
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue along with a spatial relevance level for each stored address; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue.