Leveraging Heritrix and the Wayback Machine on a corporate intranet : a case study on improving corporate archives

Presents a case study investigating using open-source, web-scale, web archiving tools Heritrix and the Wayback Machine to automatically archive the MITRE Information Infrastructure (MII) to outline the challenges of intranet web archiving, identify situations in which the open source tools are not w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:D-Lib magazine 2016-01, Vol.22 (1), p.1
1. Verfasser: Brunelle, Justin F
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Presents a case study investigating using open-source, web-scale, web archiving tools Heritrix and the Wayback Machine to automatically archive the MITRE Information Infrastructure (MII) to outline the challenges of intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. Explains how they performed a crawl of 143,268 URIs to demonstrate that the crawlers are easy to set up, efficiently crawl the intranet, and improve archive management, noting the challenges with sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customised web services. Discusses recommended approaches for overcoming these challenges. Source: National Library of New Zealand Te Puna Matauranga o Aotearoa, licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand Licence.
ISSN:1082-9873
1082-9873
DOI:10.1045/january2016-brunelle