The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment

The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2005-01, Vol.33 (15), p.4987-4994
Hauptverfasser: Sheetlin, Sergey, Park, Yonil, Spouge, John L.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4994
container_issue 15
container_start_page 4987
container_title Nucleic acids research
container_volume 33
creator Sheetlin, Sergey
Park, Yonil
Spouge, John L.
description The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.
doi_str_mv 10.1093/nar/gki800
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1199557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>899990311</sourcerecordid><originalsourceid>FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</originalsourceid><addsrcrecordid>eNpdkUFv1DAUhC1ERZeFCz8AWRw4IKX1sx07uSChirZILRwoUtWL5ThO1l0nTu0ElX-Pq12V0ost-X2eN6NB6B2QIyA1Ox51PO63riLkBVoBE7TgtaAv0YowUhZAeHWIXqd0SwhwKPkrdAgCuKwrWKHN1cbis2VorMdTtEWnzRwi3uIun72eJttiH4z2WHvXj4MdZ2z0iBuLbZrdoOcMdDEMOLlh8Xp2YUw4dLj3oXn66w066LRP9u3-XqNfp1-vTs6Lix9n306-XBSGc5iLRpCyZa2knIDhFBhYbXhb16QrpQEQlJba5hyVFhWteKcZMbIVraXatkayNfq8052WZsgveXXUXk0xe41_VNBO_T8Z3Ub14bcCqOuyfBD4uBeI4W7JIdXgkrHe69GGJSlRlYISEBn88Ay8DUscczhFCRFSymx_jT7tIBNDStF2j06AqIf2VG5P7drL8Pun3v-h-7oyUOwAl2Z7_zjXcauEZLJU59c36ppVN5c_6XcF7C9muqbU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200677721</pqid></control><display><type>article</type><title>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Sheetlin, Sergey ; Park, Yonil ; Spouge, John L.</creator><creatorcontrib>Sheetlin, Sergey ; Park, Yonil ; Spouge, John L.</creatorcontrib><description>The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gki800</identifier><identifier>PMID: 16147981</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Computational Biology - methods ; Computer Simulation ; Data Interpretation, Statistical ; Sequence Alignment - methods ; Software</subject><ispartof>Nucleic acids research, 2005-01, Vol.33 (15), p.4987-4994</ispartof><rights>Copyright Oxford University Press(England) 2005</rights><rights>The Author 2005. Published by Oxford University Press. All rights reserved 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</citedby><cites>FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199557/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199557/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16147981$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sheetlin, Sergey</creatorcontrib><creatorcontrib>Park, Yonil</creatorcontrib><creatorcontrib>Spouge, John L.</creatorcontrib><title>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.</description><subject>Computational Biology - methods</subject><subject>Computer Simulation</subject><subject>Data Interpretation, Statistical</subject><subject>Sequence Alignment - methods</subject><subject>Software</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkUFv1DAUhC1ERZeFCz8AWRw4IKX1sx07uSChirZILRwoUtWL5ThO1l0nTu0ElX-Pq12V0ost-X2eN6NB6B2QIyA1Ox51PO63riLkBVoBE7TgtaAv0YowUhZAeHWIXqd0SwhwKPkrdAgCuKwrWKHN1cbis2VorMdTtEWnzRwi3uIun72eJttiH4z2WHvXj4MdZ2z0iBuLbZrdoOcMdDEMOLlh8Xp2YUw4dLj3oXn66w066LRP9u3-XqNfp1-vTs6Lix9n306-XBSGc5iLRpCyZa2knIDhFBhYbXhb16QrpQEQlJba5hyVFhWteKcZMbIVraXatkayNfq8052WZsgveXXUXk0xe41_VNBO_T8Z3Ub14bcCqOuyfBD4uBeI4W7JIdXgkrHe69GGJSlRlYISEBn88Ay8DUscczhFCRFSymx_jT7tIBNDStF2j06AqIf2VG5P7drL8Pun3v-h-7oyUOwAl2Z7_zjXcauEZLJU59c36ppVN5c_6XcF7C9muqbU</recordid><startdate>20050101</startdate><enddate>20050101</enddate><creator>Sheetlin, Sergey</creator><creator>Park, Yonil</creator><creator>Spouge, John L.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20050101</creationdate><title>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</title><author>Sheetlin, Sergey ; Park, Yonil ; Spouge, John L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Computational Biology - methods</topic><topic>Computer Simulation</topic><topic>Data Interpretation, Statistical</topic><topic>Sequence Alignment - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sheetlin, Sergey</creatorcontrib><creatorcontrib>Park, Yonil</creatorcontrib><creatorcontrib>Spouge, John L.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sheetlin, Sergey</au><au>Park, Yonil</au><au>Spouge, John L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2005-01-01</date><risdate>2005</risdate><volume>33</volume><issue>15</issue><spage>4987</spage><epage>4994</epage><pages>4987-4994</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>16147981</pmid><doi>10.1093/nar/gki800</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2005-01, Vol.33 (15), p.4987-4994
issn 0305-1048
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1199557
source Oxford Journals Open Access Collection; MEDLINE; DOAJ Directory of Open Access Journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Computational Biology - methods
Computer Simulation
Data Interpretation, Statistical
Sequence Alignment - methods
Software
title The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T09%3A52%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Gumbel%20pre-factor%20k%20for%20gapped%20local%20alignment%20can%20be%20estimated%20from%20simulations%20of%20global%20alignment&rft.jtitle=Nucleic%20acids%20research&rft.au=Sheetlin,%20Sergey&rft.date=2005-01-01&rft.volume=33&rft.issue=15&rft.spage=4987&rft.epage=4994&rft.pages=4987-4994&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gki800&rft_dat=%3Cproquest_pubme%3E899990311%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200677721&rft_id=info:pmid/16147981&rfr_iscdi=true