Performance issues in distributed shared-nothing information-retrieval systems
Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by u...
Gespeichert in:
Veröffentlicht in: | Information processing & management 1996-11, Vol.32 (6), p.647-665 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 665 |
---|---|
container_issue | 6 |
container_start_page | 647 |
container_title | Information processing & management |
container_volume | 32 |
creator | Tomasic, Anthony Garcia-Molina, Hector |
description | Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size. |
doi_str_mv | 10.1016/S0306-4573(96)00019-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_57406415</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ericid>EJ536173</ericid><els_id>S0306457396000192</els_id><sourcerecordid>11058638</sourcerecordid><originalsourceid>FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</originalsourceid><addsrcrecordid>eNqFkU1LJDEQhoO44Oj6DxQakUUPvaaSTmKfZBE_dhEV1HNIpysa6enWVI_gvzfzwRy87CmH96mqlyeM7QP_DRz0yQOXXJeVMvKo1secc6hLscEmcGpkqaSBTTZZI1tsm-g1Q5UCMWG395jCkKau91hEohlSEfuijTSm2MxGbAt6cQnbsh_Gl9g_53TBj3Hoy4SZwg_XFfRJI07pJ_sRXEe4u3p32NPlxeP5dXlzd_X3_M9N6StQY9lIF4zxQoLiIIUQxrgglZDhFILStTQyQO1ExSswYNAp3ba8gYY7AY3gcof9Wu59S8N77jzaaSSPXed6HGZklam4zqcyePANfB1mqc_dLNRVzWUlZIbUEvJpIEoY7FuKU5c-LXA7V2wXiu3cn621XSi2Is8drpY78q4LKVuMtB4WitdK64ztLTFM0a_Ti39KajDz62erOAv7iJgs-Yj5P9qY0I-2HeJ_enwB_jWYOg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>194903423</pqid></control><display><type>article</type><title>Performance issues in distributed shared-nothing information-retrieval systems</title><source>Elsevier ScienceDirect Journals</source><creator>Tomasic, Anthony ; Garcia-Molina, Hector</creator><creatorcontrib>Tomasic, Anthony ; Garcia-Molina, Hector</creatorcontrib><description>Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.</description><identifier>ISSN: 0306-4573</identifier><identifier>EISSN: 1873-5371</identifier><identifier>DOI: 10.1016/S0306-4573(96)00019-2</identifier><identifier>CODEN: IPMADK</identifier><language>eng</language><publisher>Oxford: Elsevier Ltd</publisher><subject>Abstracts ; Bibliographic Records ; Computer Simulation ; Distributed Computing ; Distributed processing ; Evaluation ; Exact sciences and technology ; FOLIO ; Higher Education ; Indexes ; Information and communication sciences ; Information Retrieval ; Information retrieval systems ; Information retrieval systems. Information and document management system ; Information science. Documentation ; Information systems ; INSPEC ; Inverted Files ; Local Area Networks ; Online data bases ; Online databases ; Performance Factors ; Query Processing ; Science and technology ; Sciences and techniques of general use ; Stanford University CA ; Studies ; Trace drive simulation</subject><ispartof>Information processing & management, 1996-11, Vol.32 (6), p.647-665</ispartof><rights>1996</rights><rights>1997 INIST-CNRS</rights><rights>Copyright Pergamon Press Inc. Nov 1996</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</citedby><cites>FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0306457396000192$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttp://eric.ed.gov/ERICWebPortal/detail?accno=EJ536173$$DView record in ERIC$$Hfree_for_read</backlink><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=2509566$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Tomasic, Anthony</creatorcontrib><creatorcontrib>Garcia-Molina, Hector</creatorcontrib><title>Performance issues in distributed shared-nothing information-retrieval systems</title><title>Information processing & management</title><description>Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.</description><subject>Abstracts</subject><subject>Bibliographic Records</subject><subject>Computer Simulation</subject><subject>Distributed Computing</subject><subject>Distributed processing</subject><subject>Evaluation</subject><subject>Exact sciences and technology</subject><subject>FOLIO</subject><subject>Higher Education</subject><subject>Indexes</subject><subject>Information and communication sciences</subject><subject>Information Retrieval</subject><subject>Information retrieval systems</subject><subject>Information retrieval systems. Information and document management system</subject><subject>Information science. Documentation</subject><subject>Information systems</subject><subject>INSPEC</subject><subject>Inverted Files</subject><subject>Local Area Networks</subject><subject>Online data bases</subject><subject>Online databases</subject><subject>Performance Factors</subject><subject>Query Processing</subject><subject>Science and technology</subject><subject>Sciences and techniques of general use</subject><subject>Stanford University CA</subject><subject>Studies</subject><subject>Trace drive simulation</subject><issn>0306-4573</issn><issn>1873-5371</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1996</creationdate><recordtype>article</recordtype><recordid>eNqFkU1LJDEQhoO44Oj6DxQakUUPvaaSTmKfZBE_dhEV1HNIpysa6enWVI_gvzfzwRy87CmH96mqlyeM7QP_DRz0yQOXXJeVMvKo1secc6hLscEmcGpkqaSBTTZZI1tsm-g1Q5UCMWG395jCkKau91hEohlSEfuijTSm2MxGbAt6cQnbsh_Gl9g_53TBj3Hoy4SZwg_XFfRJI07pJ_sRXEe4u3p32NPlxeP5dXlzd_X3_M9N6StQY9lIF4zxQoLiIIUQxrgglZDhFILStTQyQO1ExSswYNAp3ba8gYY7AY3gcof9Wu59S8N77jzaaSSPXed6HGZklam4zqcyePANfB1mqc_dLNRVzWUlZIbUEvJpIEoY7FuKU5c-LXA7V2wXiu3cn621XSi2Is8drpY78q4LKVuMtB4WitdK64ztLTFM0a_Ti39KajDz62erOAv7iJgs-Yj5P9qY0I-2HeJ_enwB_jWYOg</recordid><startdate>19961101</startdate><enddate>19961101</enddate><creator>Tomasic, Anthony</creator><creator>Garcia-Molina, Hector</creator><general>Elsevier Ltd</general><general>Elsevier Science</general><general>Elsevier Science Ltd</general><scope>7SW</scope><scope>BJH</scope><scope>BNH</scope><scope>BNI</scope><scope>BNJ</scope><scope>BNO</scope><scope>ERI</scope><scope>PET</scope><scope>REK</scope><scope>WWN</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope></search><sort><creationdate>19961101</creationdate><title>Performance issues in distributed shared-nothing information-retrieval systems</title><author>Tomasic, Anthony ; Garcia-Molina, Hector</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c415t-b3af77c231501322277af3523f81f569373f19a24041717ea56dd0b1b0a21b203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1996</creationdate><topic>Abstracts</topic><topic>Bibliographic Records</topic><topic>Computer Simulation</topic><topic>Distributed Computing</topic><topic>Distributed processing</topic><topic>Evaluation</topic><topic>Exact sciences and technology</topic><topic>FOLIO</topic><topic>Higher Education</topic><topic>Indexes</topic><topic>Information and communication sciences</topic><topic>Information Retrieval</topic><topic>Information retrieval systems</topic><topic>Information retrieval systems. Information and document management system</topic><topic>Information science. Documentation</topic><topic>Information systems</topic><topic>INSPEC</topic><topic>Inverted Files</topic><topic>Local Area Networks</topic><topic>Online data bases</topic><topic>Online databases</topic><topic>Performance Factors</topic><topic>Query Processing</topic><topic>Science and technology</topic><topic>Sciences and techniques of general use</topic><topic>Stanford University CA</topic><topic>Studies</topic><topic>Trace drive simulation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tomasic, Anthony</creatorcontrib><creatorcontrib>Garcia-Molina, Hector</creatorcontrib><collection>ERIC</collection><collection>ERIC (Ovid)</collection><collection>ERIC</collection><collection>ERIC</collection><collection>ERIC (Legacy Platform)</collection><collection>ERIC( SilverPlatter )</collection><collection>ERIC</collection><collection>ERIC PlusText (Legacy Platform)</collection><collection>Education Resources Information Center (ERIC)</collection><collection>ERIC</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><jtitle>Information processing & management</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tomasic, Anthony</au><au>Garcia-Molina, Hector</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><ericid>EJ536173</ericid><atitle>Performance issues in distributed shared-nothing information-retrieval systems</atitle><jtitle>Information processing & management</jtitle><date>1996-11-01</date><risdate>1996</risdate><volume>32</volume><issue>6</issue><spage>647</spage><epage>665</epage><pages>647-665</pages><issn>0306-4573</issn><eissn>1873-5371</eissn><coden>IPMADK</coden><abstract>Many information-retrieval systems provides access to abstracts. For example, Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article, this database is studied by using a trace-driven simulation. It focuses on a physical-index design that accommodates truncations, inverted-index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.</abstract><cop>Oxford</cop><pub>Elsevier Ltd</pub><doi>10.1016/S0306-4573(96)00019-2</doi><tpages>19</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0306-4573 |
ispartof | Information processing & management, 1996-11, Vol.32 (6), p.647-665 |
issn | 0306-4573 1873-5371 |
language | eng |
recordid | cdi_proquest_miscellaneous_57406415 |
source | Elsevier ScienceDirect Journals |
subjects | Abstracts Bibliographic Records Computer Simulation Distributed Computing Distributed processing Evaluation Exact sciences and technology FOLIO Higher Education Indexes Information and communication sciences Information Retrieval Information retrieval systems Information retrieval systems. Information and document management system Information science. Documentation Information systems INSPEC Inverted Files Local Area Networks Online data bases Online databases Performance Factors Query Processing Science and technology Sciences and techniques of general use Stanford University CA Studies Trace drive simulation |
title | Performance issues in distributed shared-nothing information-retrieval systems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T07%3A38%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Performance%20issues%20in%20distributed%20shared-nothing%20information-retrieval%20systems&rft.jtitle=Information%20processing%20&%20management&rft.au=Tomasic,%20Anthony&rft.date=1996-11-01&rft.volume=32&rft.issue=6&rft.spage=647&rft.epage=665&rft.pages=647-665&rft.issn=0306-4573&rft.eissn=1873-5371&rft.coden=IPMADK&rft_id=info:doi/10.1016/S0306-4573(96)00019-2&rft_dat=%3Cproquest_cross%3E11058638%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=194903423&rft_id=info:pmid/&rft_ericid=EJ536173&rft_els_id=S0306457396000192&rfr_iscdi=true |