Adapting content repositories for crawling and serving
A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated with the system. The system may provide the unique URLs to a search engine. The system may respond to a crawl request from the search engine for a particular URL by converting the URL back into a file identifier, obtaining the contents of the file, creating an HTTP response from the contents of the file, and returning the response to the search engine. The system may respond to a request for a seed URL with a plurality of URLs as links in a single HTTP response. |
---|