Carbon Dating The Web: Estimating the Age of Web Resources

In the course of web research it is often necessary to estimate the creation datetime for web resources (in the general case, this value can only be estimated). While it is feasible to manually establish likely datetime values for small numbers of resources, this becomes infeasible if the collection...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2013-04
Hauptverfasser: SalahEldeen, Hany M, Nelson, Michael L
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator SalahEldeen, Hany M
Nelson, Michael L
description In the course of web research it is often necessary to estimate the creation datetime for web resources (in the general case, this value can only be estimated). While it is feasible to manually establish likely datetime values for small numbers of resources, this becomes infeasible if the collection is large. We present "carbon date", a simple web application that estimates the creation date for a URI by polling a number of sources of evidence and returning a machine-readable structure with their respective values. To establish a likely datetime, we poll bitly for the first time someone shortened the URI, topsy for the first time someone tweeted the URI, a Memento aggregator for the first time it appeared in a public web archive, Google's time of last crawl, and the Last-Modified HTTP response header of the resource itself. We also examine the backlinks of the URI as reported by Google and apply the same techniques for the resources that link to the URI. We evaluated our tool on a gold-standard data set of 1200 URIs in which the creation date was manually verified. We were able to estimate a creation date for 75.90% of the resources, with 32.78% having the correct value. Given the different nature of the URIs, the union of the various methods produces the best results. While the Google last crawl date and topsy account for nearly 66% of the closest answers, eliminating the web archives or Last-Modified from the results produces the largest overall negative impact on the results. The carbon date application is available for download or use via a webAPI.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2084923797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2084923797</sourcerecordid><originalsourceid>FETCH-proquest_journals_20849237973</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwck4sSsrPU3BJLMnMS1cIyUhVCE9NslJwLS7JzIWIlQDFHNNTFfLTQFIKQanF-aVFyanFPAysaYk5xam8UJqbQdnNNcTZQ7egKL-wNLW4JD4LqDAPKBVvZGBhYmlkbG5pbkycKgAQJDV7</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2084923797</pqid></control><display><type>article</type><title>Carbon Dating The Web: Estimating the Age of Web Resources</title><source>Free E- Journals</source><creator>SalahEldeen, Hany M ; Nelson, Michael L</creator><creatorcontrib>SalahEldeen, Hany M ; Nelson, Michael L</creatorcontrib><description>In the course of web research it is often necessary to estimate the creation datetime for web resources (in the general case, this value can only be estimated). While it is feasible to manually establish likely datetime values for small numbers of resources, this becomes infeasible if the collection is large. We present "carbon date", a simple web application that estimates the creation date for a URI by polling a number of sources of evidence and returning a machine-readable structure with their respective values. To establish a likely datetime, we poll bitly for the first time someone shortened the URI, topsy for the first time someone tweeted the URI, a Memento aggregator for the first time it appeared in a public web archive, Google's time of last crawl, and the Last-Modified HTTP response header of the resource itself. We also examine the backlinks of the URI as reported by Google and apply the same techniques for the resources that link to the URI. We evaluated our tool on a gold-standard data set of 1200 URIs in which the creation date was manually verified. We were able to estimate a creation date for 75.90% of the resources, with 32.78% having the correct value. Given the different nature of the URIs, the union of the various methods produces the best results. While the Google last crawl date and topsy account for nearly 66% of the closest answers, eliminating the web archives or Last-Modified from the results produces the largest overall negative impact on the results. The carbon date application is available for download or use via a webAPI.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Applications programs ; Carbon ; Digital archives ; Downloading ; Internet resources ; Standard data</subject><ispartof>arXiv.org, 2013-04</ispartof><rights>2013. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>SalahEldeen, Hany M</creatorcontrib><creatorcontrib>Nelson, Michael L</creatorcontrib><title>Carbon Dating The Web: Estimating the Age of Web Resources</title><title>arXiv.org</title><description>In the course of web research it is often necessary to estimate the creation datetime for web resources (in the general case, this value can only be estimated). While it is feasible to manually establish likely datetime values for small numbers of resources, this becomes infeasible if the collection is large. We present "carbon date", a simple web application that estimates the creation date for a URI by polling a number of sources of evidence and returning a machine-readable structure with their respective values. To establish a likely datetime, we poll bitly for the first time someone shortened the URI, topsy for the first time someone tweeted the URI, a Memento aggregator for the first time it appeared in a public web archive, Google's time of last crawl, and the Last-Modified HTTP response header of the resource itself. We also examine the backlinks of the URI as reported by Google and apply the same techniques for the resources that link to the URI. We evaluated our tool on a gold-standard data set of 1200 URIs in which the creation date was manually verified. We were able to estimate a creation date for 75.90% of the resources, with 32.78% having the correct value. Given the different nature of the URIs, the union of the various methods produces the best results. While the Google last crawl date and topsy account for nearly 66% of the closest answers, eliminating the web archives or Last-Modified from the results produces the largest overall negative impact on the results. The carbon date application is available for download or use via a webAPI.</description><subject>Applications programs</subject><subject>Carbon</subject><subject>Digital archives</subject><subject>Downloading</subject><subject>Internet resources</subject><subject>Standard data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSwck4sSsrPU3BJLMnMS1cIyUhVCE9NslJwLS7JzIWIlQDFHNNTFfLTQFIKQanF-aVFyanFPAysaYk5xam8UJqbQdnNNcTZQ7egKL-wNLW4JD4LqDAPKBVvZGBhYmlkbG5pbkycKgAQJDV7</recordid><startdate>20130418</startdate><enddate>20130418</enddate><creator>SalahEldeen, Hany M</creator><creator>Nelson, Michael L</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20130418</creationdate><title>Carbon Dating The Web: Estimating the Age of Web Resources</title><author>SalahEldeen, Hany M ; Nelson, Michael L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20849237973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Applications programs</topic><topic>Carbon</topic><topic>Digital archives</topic><topic>Downloading</topic><topic>Internet resources</topic><topic>Standard data</topic><toplevel>online_resources</toplevel><creatorcontrib>SalahEldeen, Hany M</creatorcontrib><creatorcontrib>Nelson, Michael L</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SalahEldeen, Hany M</au><au>Nelson, Michael L</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Carbon Dating The Web: Estimating the Age of Web Resources</atitle><jtitle>arXiv.org</jtitle><date>2013-04-18</date><risdate>2013</risdate><eissn>2331-8422</eissn><abstract>In the course of web research it is often necessary to estimate the creation datetime for web resources (in the general case, this value can only be estimated). While it is feasible to manually establish likely datetime values for small numbers of resources, this becomes infeasible if the collection is large. We present "carbon date", a simple web application that estimates the creation date for a URI by polling a number of sources of evidence and returning a machine-readable structure with their respective values. To establish a likely datetime, we poll bitly for the first time someone shortened the URI, topsy for the first time someone tweeted the URI, a Memento aggregator for the first time it appeared in a public web archive, Google's time of last crawl, and the Last-Modified HTTP response header of the resource itself. We also examine the backlinks of the URI as reported by Google and apply the same techniques for the resources that link to the URI. We evaluated our tool on a gold-standard data set of 1200 URIs in which the creation date was manually verified. We were able to estimate a creation date for 75.90% of the resources, with 32.78% having the correct value. Given the different nature of the URIs, the union of the various methods produces the best results. While the Google last crawl date and topsy account for nearly 66% of the closest answers, eliminating the web archives or Last-Modified from the results produces the largest overall negative impact on the results. The carbon date application is available for download or use via a webAPI.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2013-04
issn 2331-8422
language eng
recordid cdi_proquest_journals_2084923797
source Free E- Journals
subjects Applications programs
Carbon
Digital archives
Downloading
Internet resources
Standard data
title Carbon Dating The Web: Estimating the Age of Web Resources
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T05%3A52%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Carbon%20Dating%20The%20Web:%20Estimating%20the%20Age%20of%20Web%20Resources&rft.jtitle=arXiv.org&rft.au=SalahEldeen,%20Hany%20M&rft.date=2013-04-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2084923797%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2084923797&rft_id=info:pmid/&rfr_iscdi=true