Performance issues in distributed shared-nothing information-retrieval systems

Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 1996-11, Vol.32 (6), p.647-665
Hauptverfasser: Tomasic, Anthony, Garcia-Molina, Hector
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 665
container_issue 6
container_start_page 647
container_title Information processing & management
container_volume 32
creator Tomasic, Anthony
Garcia-Molina, Hector
description Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.
doi_str_mv 10.1016/S0306-4573(96)00019-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_57406415</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ536173</ericid><els_id>S0306457396000192</els_id><sourcerecordid>11058638</sourcerecordid><originalsourceid>FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</originalsourceid><addsrcrecordid>eNqFkU1LJDEQhoO44Oj6DxQakUUPvaaSTmKfZBE_dhEV1HNIpysa6enWVI_gvzfzwRy87CmH96mqlyeM7QP_DRz0yQOXXJeVMvKo1secc6hLscEmcGpkqaSBTTZZI1tsm-g1Q5UCMWG395jCkKau91hEohlSEfuijTSm2MxGbAt6cQnbsh_Gl9g_53TBj3Hoy4SZwg_XFfRJI07pJ_sRXEe4u3p32NPlxeP5dXlzd_X3_M9N6StQY9lIF4zxQoLiIIUQxrgglZDhFILStTQyQO1ExSswYNAp3ba8gYY7AY3gcof9Wu59S8N77jzaaSSPXed6HGZklam4zqcyePANfB1mqc_dLNRVzWUlZIbUEvJpIEoY7FuKU5c-LXA7V2wXiu3cn621XSi2Is8drpY78q4LKVuMtB4WitdK64ztLTFM0a_Ti39KajDz62erOAv7iJgs-Yj5P9qY0I-2HeJ_enwB_jWYOg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>194903423</pqid></control><display><type>article</type><title>Performance issues in distributed shared-nothing information-retrieval systems</title><source>Elsevier ScienceDirect Journals</source><creator>Tomasic, Anthony ; Garcia-Molina, Hector</creator><creatorcontrib>Tomasic, Anthony ; Garcia-Molina, Hector</creatorcontrib><description>Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.</description><identifier>ISSN: 0306-4573</identifier><identifier>EISSN: 1873-5371</identifier><identifier>DOI: 10.1016/S0306-4573(96)00019-2</identifier><identifier>CODEN: IPMADK</identifier><language>eng</language><publisher>Oxford: Elsevier Ltd</publisher><subject>Abstracts ; Bibliographic Records ; Computer Simulation ; Distributed Computing ; Distributed processing ; Evaluation ; Exact sciences and technology ; FOLIO ; Higher Education ; Indexes ; Information and communication sciences ; Information Retrieval ; Information retrieval systems ; Information retrieval systems. Information and document management system ; Information science. Documentation ; Information systems ; INSPEC ; Inverted Files ; Local Area Networks ; Online data bases ; Online databases ; Performance Factors ; Query Processing ; Science and technology ; Sciences and techniques of general use ; Stanford University CA ; Studies ; Trace drive simulation</subject><ispartof>Information processing &amp; management, 1996-11, Vol.32 (6), p.647-665</ispartof><rights>1996</rights><rights>1997 INIST-CNRS</rights><rights>Copyright Pergamon Press Inc. Nov 1996</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</citedby><cites>FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0306457396000192$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ536173$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=2509566$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Tomasic, Anthony</creatorcontrib><creatorcontrib>Garcia-Molina, Hector</creatorcontrib><title>Performance issues in distributed shared-nothing information-retrieval systems</title><title>Information processing &amp; management</title><description>Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.</description><subject>Abstracts</subject><subject>Bibliographic Records</subject><subject>Computer Simulation</subject><subject>Distributed Computing</subject><subject>Distributed processing</subject><subject>Evaluation</subject><subject>Exact sciences and technology</subject><subject>FOLIO</subject><subject>Higher Education</subject><subject>Indexes</subject><subject>Information and communication sciences</subject><subject>Information Retrieval</subject><subject>Information retrieval systems</subject><subject>Information retrieval systems. Information and document management system</subject><subject>Information science. Documentation</subject><subject>Information systems</subject><subject>INSPEC</subject><subject>Inverted Files</subject><subject>Local Area Networks</subject><subject>Online data bases</subject><subject>Online databases</subject><subject>Performance Factors</subject><subject>Query Processing</subject><subject>Science and technology</subject><subject>Sciences and techniques of general use</subject><subject>Stanford University CA</subject><subject>Studies</subject><subject>Trace drive simulation</subject><issn>0306-4573</issn><issn>1873-5371</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1996</creationdate><recordtype>article</recordtype><recordid>eNqFkU1LJDEQhoO44Oj6DxQakUUPvaaSTmKfZBE_dhEV1HNIpysa6enWVI_gvzfzwRy87CmH96mqlyeM7QP_DRz0yQOXXJeVMvKo1secc6hLscEmcGpkqaSBTTZZI1tsm-g1Q5UCMWG395jCkKau91hEohlSEfuijTSm2MxGbAt6cQnbsh_Gl9g_53TBj3Hoy4SZwg_XFfRJI07pJ_sRXEe4u3p32NPlxeP5dXlzd_X3_M9N6StQY9lIF4zxQoLiIIUQxrgglZDhFILStTQyQO1ExSswYNAp3ba8gYY7AY3gcof9Wu59S8N77jzaaSSPXed6HGZklam4zqcyePANfB1mqc_dLNRVzWUlZIbUEvJpIEoY7FuKU5c-LXA7V2wXiu3cn621XSi2Is8drpY78q4LKVuMtB4WitdK64ztLTFM0a_Ti39KajDz62erOAv7iJgs-Yj5P9qY0I-2HeJ_enwB_jWYOg</recordid><startdate>19961101</startdate><enddate>19961101</enddate><creator>Tomasic, Anthony</creator><creator>Garcia-Molina, Hector</creator><general>Elsevier Ltd</general><general>Elsevier Science</general><general>Elsevier Science Ltd</general><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope></search><sort><creationdate>19961101</creationdate><title>Performance issues in distributed shared-nothing information-retrieval systems</title><author>Tomasic, Anthony ; Garcia-Molina, Hector</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1996</creationdate><topic>Abstracts</topic><topic>Bibliographic Records</topic><topic>Computer Simulation</topic><topic>Distributed Computing</topic><topic>Distributed processing</topic><topic>Evaluation</topic><topic>Exact sciences and technology</topic><topic>FOLIO</topic><topic>Higher Education</topic><topic>Indexes</topic><topic>Information and communication sciences</topic><topic>Information Retrieval</topic><topic>Information retrieval systems</topic><topic>Information retrieval systems. Information and document management system</topic><topic>Information science. Documentation</topic><topic>Information systems</topic><topic>INSPEC</topic><topic>Inverted Files</topic><topic>Local Area Networks</topic><topic>Online data bases</topic><topic>Online databases</topic><topic>Performance Factors</topic><topic>Query Processing</topic><topic>Science and technology</topic><topic>Sciences and techniques of general use</topic><topic>Stanford University CA</topic><topic>Studies</topic><topic>Trace drive simulation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tomasic, Anthony</creatorcontrib><creatorcontrib>Garcia-Molina, Hector</creatorcontrib><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><jtitle>Information processing &amp; management</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tomasic, Anthony</au><au>Garcia-Molina, Hector</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ536173</ericid><atitle>Performance issues in distributed shared-nothing information-retrieval systems</atitle><jtitle>Information processing &amp; management</jtitle><date>1996-11-01</date><risdate>1996</risdate><volume>32</volume><issue>6</issue><spage>647</spage><epage>665</epage><pages>647-665</pages><issn>0306-4573</issn><eissn>1873-5371</eissn><coden>IPMADK</coden><abstract>Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.</abstract><cop>Oxford</cop><pub>Elsevier Ltd</pub><doi>10.1016/S0306-4573(96)00019-2</doi><tpages>19</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0306-4573
ispartof Information processing & management, 1996-11, Vol.32 (6), p.647-665
issn 0306-4573
1873-5371
language eng
recordid cdi_proquest_miscellaneous_57406415
source Elsevier ScienceDirect Journals
subjects Abstracts
Bibliographic Records
Computer Simulation
Distributed Computing
Distributed processing
Evaluation
Exact sciences and technology
FOLIO
Higher Education
Indexes
Information and communication sciences
Information Retrieval
Information retrieval systems
Information retrieval systems. Information and document management system
Information science. Documentation
Information systems
INSPEC
Inverted Files
Local Area Networks
Online data bases
Online databases
Performance Factors
Query Processing
Science and technology
Sciences and techniques of general use
Stanford University CA
Studies
Trace drive simulation
title Performance issues in distributed shared-nothing information-retrieval systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T07%3A38%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Performance%20issues%20in%20distributed%20shared-nothing%20information-retrieval%20systems&rft.jtitle=Information%20processing%20&%20management&rft.au=Tomasic,%20Anthony&rft.date=1996-11-01&rft.volume=32&rft.issue=6&rft.spage=647&rft.epage=665&rft.pages=647-665&rft.issn=0306-4573&rft.eissn=1873-5371&rft.coden=IPMADK&rft_id=info:doi/10.1016/S0306-4573(96)00019-2&rft_dat=%3Cproquest_cross%3E11058638%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=194903423&rft_id=info:pmid/&rft_ericid=EJ536173&rft_els_id=S0306457396000192&rfr_iscdi=true