Spatially directed crawling of documents

A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Frank, John R, Donoghue, Karen
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Frank, John R
Donoghue, Karen
description A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue along with a spatial relevance level for each stored address; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue.
format Patent
fullrecord <record><control><sourceid>uspatents_EFH</sourceid><recordid>TN_cdi_uspatents_grants_07539693</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>07539693</sourcerecordid><originalsourceid>FETCH-uspatents_grants_075396933</originalsourceid><addsrcrecordid>eNrjZNAILkgsyUzMyalUSMksSk0uSU1RSC5KLM_JzEtXyE9TSMlPLs1NzSsp5mFgTUvMKU7lhdLcDApuriHOHrqlxUATQCri04sSQZSBuamxpZmlsTERSgDHXymG</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Spatially directed crawling of documents</title><source>USPTO Issued Patents</source><creator>Frank, John R ; Donoghue, Karen</creator><creatorcontrib>Frank, John R ; Donoghue, Karen ; MetaCarta, Inc</creatorcontrib><description>A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue along with a spatial relevance level for each stored address; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue.</description><language>eng</language><creationdate>2009</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7539693$$EPDF$$P50$$Guspatents$$Hfree_for_read</linktopdf><link.rule.ids>230,308,776,798,881,64015</link.rule.ids><linktorsrc>$$Uhttps://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7539693$$EView_record_in_USPTO$$FView_record_in_$$GUSPTO$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Frank, John R</creatorcontrib><creatorcontrib>Donoghue, Karen</creatorcontrib><creatorcontrib>MetaCarta, Inc</creatorcontrib><title>Spatially directed crawling of documents</title><description>A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue along with a spatial relevance level for each stored address; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue.</description><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2009</creationdate><recordtype>patent</recordtype><sourceid>EFH</sourceid><recordid>eNrjZNAILkgsyUzMyalUSMksSk0uSU1RSC5KLM_JzEtXyE9TSMlPLs1NzSsp5mFgTUvMKU7lhdLcDApuriHOHrqlxUATQCri04sSQZSBuamxpZmlsTERSgDHXymG</recordid><startdate>20090526</startdate><enddate>20090526</enddate><creator>Frank, John R</creator><creator>Donoghue, Karen</creator><scope>EFH</scope></search><sort><creationdate>20090526</creationdate><title>Spatially directed crawling of documents</title><author>Frank, John R ; Donoghue, Karen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-uspatents_grants_075396933</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2009</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Frank, John R</creatorcontrib><creatorcontrib>Donoghue, Karen</creatorcontrib><creatorcontrib>MetaCarta, Inc</creatorcontrib><collection>USPTO Issued Patents</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Frank, John R</au><au>Donoghue, Karen</au><aucorp>MetaCarta, Inc</aucorp><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Spatially directed crawling of documents</title><date>2009-05-26</date><risdate>2009</risdate><abstract>A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue along with a spatial relevance level for each stored address; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_uspatents_grants_07539693
source USPTO Issued Patents
title Spatially directed crawling of documents
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T10%3A07%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-uspatents_EFH&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Frank,%20John%20R&rft.aucorp=MetaCarta,%20Inc&rft.date=2009-05-26&rft_id=info:doi/&rft_dat=%3Cuspatents_EFH%3E07539693%3C/uspatents_EFH%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true