A technique for measuring the relative size and overlap of public Web search engines
Search engines are among the most useful and popular services on the Web. Users are eager to know how they compare. Which one has the largest coverage? Have they indexed the same portion of the Web? How many pages are out there? Although these questions have been debated in the popular and technical...
Gespeichert in:
Veröffentlicht in: | Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 1998-04, Vol.30 (1), p.379-388 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 388 |
---|---|
container_issue | 1 |
container_start_page | 379 |
container_title | Computer networks (Amsterdam, Netherlands : 1999) |
container_volume | 30 |
creator | Bharat, Krishna Broder, Andrei |
description | Search engines are among the most useful and popular services on the Web. Users are eager to know how they compare. Which one has the largest coverage? Have they indexed the same portion of the Web? How many pages are out there? Although these questions have been debated in the popular and technical press, no objective evaluation methodology has been proposed and few clear answers have emerged. In this paper we describe a standardized, statistical way of measuring search engine coverage and overlap through random queries. Our technique does not require privileged access to any database. It can be implemented by third-party evaluators using only public query interfaces.
We present results from our experiments showing size and overlap estimates for HotBot, AltaVista, Excite, and Infoseek as percentages of their total joint coverage in mid 1997 and in November 1997. Our method does not provide absolute values. However using data from other sources we estimate that as of November 1997 the number of pages indexed by HotBot, AltaVista, Excite, and Infoseek were respectively roughly 77M, 100M, 32M, and 17M and the joint total coverage was 160 million pages. We further conjecture that the size of the static, public Web as of November was over 200 million pages. The most startling finding is that the overlap is very small: less than 1.4% of the total coverage, or about 2.2 million pages were indexed by all four engines. |
doi_str_mv | 10.1016/S0169-7552(98)00127-5 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_57459824</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0169755298001275</els_id><sourcerecordid>29436750</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-98b3f8623c412536b3d733df1a154dee0c421e00e5749fa97abbd2b66a8ea9a43</originalsourceid><addsrcrecordid>eNqFkE1LxDAQhoMouH78BCGIiB6q-WjS5iSL-AWCBxWPYZpO3Ug3XZN2QX-9XVc8ePEyc3ned4aHkAPOzjjj-vxxHCYrlBInpjxljIsiUxtkwstCZAXTZpNMfpFtspPSGxspXpgJeZrSHt0s-PcBadNFOkdIQ_ThlfYzpBFb6P0SafKfSCHUtFtibGFBu4Yuhqr1jr5gRRNCdDOK4dUHTHtkq4E24f7P3iXP11dPl7fZ_cPN3eX0PnPS6D4zZSWbUgvpci6U1JWsCynrhgNXeY3IXC44MoaqyE0DpoCqqkWlNZQIBnK5S47XvYvYjf-n3s59cti2ELAbkh1zypRiBR7-Ad-6IYbxN8uN0bmSSo-QWkMudilFbOwi-jnED8uZXYm236LtyqI1pf0WbdWYO_oph-SgbSIE59NvWEimZbmqv1hjOBpZeow2OY_BYe0jut7Wnf_n0BdlUpFW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>199645356</pqid></control><display><type>article</type><title>A technique for measuring the relative size and overlap of public Web search engines</title><source>Alma/SFX Local Collection</source><creator>Bharat, Krishna ; Broder, Andrei</creator><creatorcontrib>Bharat, Krishna ; Broder, Andrei</creatorcontrib><description>Search engines are among the most useful and popular services on the Web. Users are eager to know how they compare. Which one has the largest coverage? Have they indexed the same portion of the Web? How many pages are out there? Although these questions have been debated in the popular and technical press, no objective evaluation methodology has been proposed and few clear answers have emerged. In this paper we describe a standardized, statistical way of measuring search engine coverage and overlap through random queries. Our technique does not require privileged access to any database. It can be implemented by third-party evaluators using only public query interfaces.
We present results from our experiments showing size and overlap estimates for HotBot, AltaVista, Excite, and Infoseek as percentages of their total joint coverage in mid 1997 and in November 1997. Our method does not provide absolute values. However using data from other sources we estimate that as of November 1997 the number of pages indexed by HotBot, AltaVista, Excite, and Infoseek were respectively roughly 77M, 100M, 32M, and 17M and the joint total coverage was 160 million pages. We further conjecture that the size of the static, public Web as of November was over 200 million pages. The most startling finding is that the overlap is very small: less than 1.4% of the total coverage, or about 2.2 million pages were indexed by all four engines.</description><identifier>ISSN: 0169-7552</identifier><identifier>ISSN: 1389-1286</identifier><identifier>EISSN: 1872-7069</identifier><identifier>DOI: 10.1016/S0169-7552(98)00127-5</identifier><identifier>CODEN: CNISE9</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Applied sciences ; Computer science; control theory; systems ; Coverage ; Exact sciences and technology ; Information systems. Data bases ; Memory organisation. Data processing ; Overlap ; Search engines ; Size ; Software ; Studies ; Web page sampling ; Websites ; World Wide Web</subject><ispartof>Computer networks (Amsterdam, Netherlands : 1999), 1998-04, Vol.30 (1), p.379-388</ispartof><rights>1998</rights><rights>1998 INIST-CNRS</rights><rights>Copyright Elsevier Sequoia S.A. Apr 1998</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-98b3f8623c412536b3d733df1a154dee0c421e00e5749fa97abbd2b66a8ea9a43</citedby><cites>FETCH-LOGICAL-c396t-98b3f8623c412536b3d733df1a154dee0c421e00e5749fa97abbd2b66a8ea9a43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,780,784,789,790,23930,23931,25140,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=2306386$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Bharat, Krishna</creatorcontrib><creatorcontrib>Broder, Andrei</creatorcontrib><title>A technique for measuring the relative size and overlap of public Web search engines</title><title>Computer networks (Amsterdam, Netherlands : 1999)</title><description>Search engines are among the most useful and popular services on the Web. Users are eager to know how they compare. Which one has the largest coverage? Have they indexed the same portion of the Web? How many pages are out there? Although these questions have been debated in the popular and technical press, no objective evaluation methodology has been proposed and few clear answers have emerged. In this paper we describe a standardized, statistical way of measuring search engine coverage and overlap through random queries. Our technique does not require privileged access to any database. It can be implemented by third-party evaluators using only public query interfaces.
We present results from our experiments showing size and overlap estimates for HotBot, AltaVista, Excite, and Infoseek as percentages of their total joint coverage in mid 1997 and in November 1997. Our method does not provide absolute values. However using data from other sources we estimate that as of November 1997 the number of pages indexed by HotBot, AltaVista, Excite, and Infoseek were respectively roughly 77M, 100M, 32M, and 17M and the joint total coverage was 160 million pages. We further conjecture that the size of the static, public Web as of November was over 200 million pages. The most startling finding is that the overlap is very small: less than 1.4% of the total coverage, or about 2.2 million pages were indexed by all four engines.</description><subject>Applied sciences</subject><subject>Computer science; control theory; systems</subject><subject>Coverage</subject><subject>Exact sciences and technology</subject><subject>Information systems. Data bases</subject><subject>Memory organisation. Data processing</subject><subject>Overlap</subject><subject>Search engines</subject><subject>Size</subject><subject>Software</subject><subject>Studies</subject><subject>Web page sampling</subject><subject>Websites</subject><subject>World Wide Web</subject><issn>0169-7552</issn><issn>1389-1286</issn><issn>1872-7069</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1998</creationdate><recordtype>article</recordtype><recordid>eNqFkE1LxDAQhoMouH78BCGIiB6q-WjS5iSL-AWCBxWPYZpO3Ug3XZN2QX-9XVc8ePEyc3ned4aHkAPOzjjj-vxxHCYrlBInpjxljIsiUxtkwstCZAXTZpNMfpFtspPSGxspXpgJeZrSHt0s-PcBadNFOkdIQ_ThlfYzpBFb6P0SafKfSCHUtFtibGFBu4Yuhqr1jr5gRRNCdDOK4dUHTHtkq4E24f7P3iXP11dPl7fZ_cPN3eX0PnPS6D4zZSWbUgvpci6U1JWsCynrhgNXeY3IXC44MoaqyE0DpoCqqkWlNZQIBnK5S47XvYvYjf-n3s59cti2ELAbkh1zypRiBR7-Ad-6IYbxN8uN0bmSSo-QWkMudilFbOwi-jnED8uZXYm236LtyqI1pf0WbdWYO_oph-SgbSIE59NvWEimZbmqv1hjOBpZeow2OY_BYe0jut7Wnf_n0BdlUpFW</recordid><startdate>19980401</startdate><enddate>19980401</enddate><creator>Bharat, Krishna</creator><creator>Broder, Andrei</creator><general>Elsevier B.V</general><general>Elsevier Science</general><general>Elsevier Sequoia S.A</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>19980401</creationdate><title>A technique for measuring the relative size and overlap of public Web search engines</title><author>Bharat, Krishna ; Broder, Andrei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-98b3f8623c412536b3d733df1a154dee0c421e00e5749fa97abbd2b66a8ea9a43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1998</creationdate><topic>Applied sciences</topic><topic>Computer science; control theory; systems</topic><topic>Coverage</topic><topic>Exact sciences and technology</topic><topic>Information systems. Data bases</topic><topic>Memory organisation. Data processing</topic><topic>Overlap</topic><topic>Search engines</topic><topic>Size</topic><topic>Software</topic><topic>Studies</topic><topic>Web page sampling</topic><topic>Websites</topic><topic>World Wide Web</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bharat, Krishna</creatorcontrib><creatorcontrib>Broder, Andrei</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computer networks (Amsterdam, Netherlands : 1999)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bharat, Krishna</au><au>Broder, Andrei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A technique for measuring the relative size and overlap of public Web search engines</atitle><jtitle>Computer networks (Amsterdam, Netherlands : 1999)</jtitle><date>1998-04-01</date><risdate>1998</risdate><volume>30</volume><issue>1</issue><spage>379</spage><epage>388</epage><pages>379-388</pages><issn>0169-7552</issn><issn>1389-1286</issn><eissn>1872-7069</eissn><coden>CNISE9</coden><abstract>Search engines are among the most useful and popular services on the Web. Users are eager to know how they compare. Which one has the largest coverage? Have they indexed the same portion of the Web? How many pages are out there? Although these questions have been debated in the popular and technical press, no objective evaluation methodology has been proposed and few clear answers have emerged. In this paper we describe a standardized, statistical way of measuring search engine coverage and overlap through random queries. Our technique does not require privileged access to any database. It can be implemented by third-party evaluators using only public query interfaces.
We present results from our experiments showing size and overlap estimates for HotBot, AltaVista, Excite, and Infoseek as percentages of their total joint coverage in mid 1997 and in November 1997. Our method does not provide absolute values. However using data from other sources we estimate that as of November 1997 the number of pages indexed by HotBot, AltaVista, Excite, and Infoseek were respectively roughly 77M, 100M, 32M, and 17M and the joint total coverage was 160 million pages. We further conjecture that the size of the static, public Web as of November was over 200 million pages. The most startling finding is that the overlap is very small: less than 1.4% of the total coverage, or about 2.2 million pages were indexed by all four engines.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/S0169-7552(98)00127-5</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0169-7552 |
ispartof | Computer networks (Amsterdam, Netherlands : 1999), 1998-04, Vol.30 (1), p.379-388 |
issn | 0169-7552 1389-1286 1872-7069 |
language | eng |
recordid | cdi_proquest_miscellaneous_57459824 |
source | Alma/SFX Local Collection |
subjects | Applied sciences Computer science control theory systems Coverage Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Overlap Search engines Size Software Studies Web page sampling Websites World Wide Web |
title | A technique for measuring the relative size and overlap of public Web search engines |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T12%3A16%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20technique%20for%20measuring%20the%20relative%20size%20and%20overlap%20of%20public%20Web%20search%20engines&rft.jtitle=Computer%20networks%20(Amsterdam,%20Netherlands%20:%201999)&rft.au=Bharat,%20Krishna&rft.date=1998-04-01&rft.volume=30&rft.issue=1&rft.spage=379&rft.epage=388&rft.pages=379-388&rft.issn=0169-7552&rft.eissn=1872-7069&rft.coden=CNISE9&rft_id=info:doi/10.1016/S0169-7552(98)00127-5&rft_dat=%3Cproquest_cross%3E29436750%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=199645356&rft_id=info:pmid/&rft_els_id=S0169755298001275&rfr_iscdi=true |