Leveraging Heritrix and the Wayback Machine on a corporate intranet : a case study on improving corporate archives
Presents a case study investigating using open-source, web-scale, web archiving tools Heritrix and the Wayback Machine to automatically archive the MITRE Information Infrastructure (MII) to outline the challenges of intranet web archiving, identify situations in which the open source tools are not w...
Gespeichert in:
Veröffentlicht in: | D-Lib magazine 2016-01, Vol.22 (1), p.1 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Presents a case study investigating using open-source, web-scale, web archiving tools Heritrix and the Wayback Machine to automatically archive the MITRE Information Infrastructure (MII) to outline the challenges of intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. Explains how they performed a crawl of 143,268 URIs to demonstrate that the crawlers are easy to set up, efficiently crawl the intranet, and improve archive management, noting the challenges with sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customised web services. Discusses recommended approaches for overcoming these challenges. Source: National Library of New Zealand Te Puna Matauranga o Aotearoa, licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand Licence. |
---|---|
ISSN: | 1082-9873 1082-9873 |
DOI: | 10.1045/january2016-brunelle |