The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use...
Gespeichert in:
Veröffentlicht in: | Nucleic acids research 2005-01, Vol.33 (15), p.4987-4994 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4994 |
---|---|
container_issue | 15 |
container_start_page | 4987 |
container_title | Nucleic acids research |
container_volume | 33 |
creator | Sheetlin, Sergey Park, Yonil Spouge, John L. |
description | The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor. |
doi_str_mv | 10.1093/nar/gki800 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1199557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>899990311</sourcerecordid><originalsourceid>FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</originalsourceid><addsrcrecordid>eNpdkUFv1DAUhC1ERZeFCz8AWRw4IKX1sx07uSChirZILRwoUtWL5ThO1l0nTu0ElX-Pq12V0ost-X2eN6NB6B2QIyA1Ox51PO63riLkBVoBE7TgtaAv0YowUhZAeHWIXqd0SwhwKPkrdAgCuKwrWKHN1cbis2VorMdTtEWnzRwi3uIun72eJttiH4z2WHvXj4MdZ2z0iBuLbZrdoOcMdDEMOLlh8Xp2YUw4dLj3oXn66w066LRP9u3-XqNfp1-vTs6Lix9n306-XBSGc5iLRpCyZa2knIDhFBhYbXhb16QrpQEQlJba5hyVFhWteKcZMbIVraXatkayNfq8052WZsgveXXUXk0xe41_VNBO_T8Z3Ub14bcCqOuyfBD4uBeI4W7JIdXgkrHe69GGJSlRlYISEBn88Ay8DUscczhFCRFSymx_jT7tIBNDStF2j06AqIf2VG5P7drL8Pun3v-h-7oyUOwAl2Z7_zjXcauEZLJU59c36ppVN5c_6XcF7C9muqbU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200677721</pqid></control><display><type>article</type><title>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Sheetlin, Sergey ; Park, Yonil ; Spouge, John L.</creator><creatorcontrib>Sheetlin, Sergey ; Park, Yonil ; Spouge, John L.</creatorcontrib><description>The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gki800</identifier><identifier>PMID: 16147981</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Computational Biology - methods ; Computer Simulation ; Data Interpretation, Statistical ; Sequence Alignment - methods ; Software</subject><ispartof>Nucleic acids research, 2005-01, Vol.33 (15), p.4987-4994</ispartof><rights>Copyright Oxford University Press(England) 2005</rights><rights>The Author 2005. Published by Oxford University Press. All rights reserved 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</citedby><cites>FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199557/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199557/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16147981$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sheetlin, Sergey</creatorcontrib><creatorcontrib>Park, Yonil</creatorcontrib><creatorcontrib>Spouge, John L.</creatorcontrib><title>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.</description><subject>Computational Biology - methods</subject><subject>Computer Simulation</subject><subject>Data Interpretation, Statistical</subject><subject>Sequence Alignment - methods</subject><subject>Software</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkUFv1DAUhC1ERZeFCz8AWRw4IKX1sx07uSChirZILRwoUtWL5ThO1l0nTu0ElX-Pq12V0ost-X2eN6NB6B2QIyA1Ox51PO63riLkBVoBE7TgtaAv0YowUhZAeHWIXqd0SwhwKPkrdAgCuKwrWKHN1cbis2VorMdTtEWnzRwi3uIun72eJttiH4z2WHvXj4MdZ2z0iBuLbZrdoOcMdDEMOLlh8Xp2YUw4dLj3oXn66w066LRP9u3-XqNfp1-vTs6Lix9n306-XBSGc5iLRpCyZa2knIDhFBhYbXhb16QrpQEQlJba5hyVFhWteKcZMbIVraXatkayNfq8052WZsgveXXUXk0xe41_VNBO_T8Z3Ub14bcCqOuyfBD4uBeI4W7JIdXgkrHe69GGJSlRlYISEBn88Ay8DUscczhFCRFSymx_jT7tIBNDStF2j06AqIf2VG5P7drL8Pun3v-h-7oyUOwAl2Z7_zjXcauEZLJU59c36ppVN5c_6XcF7C9muqbU</recordid><startdate>20050101</startdate><enddate>20050101</enddate><creator>Sheetlin, Sergey</creator><creator>Park, Yonil</creator><creator>Spouge, John L.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20050101</creationdate><title>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</title><author>Sheetlin, Sergey ; Park, Yonil ; Spouge, John L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c441t-b605d3d72401c42131eac4d990f57c116225ae1418a68284fa30c7d6de2aedc73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Computational Biology - methods</topic><topic>Computer Simulation</topic><topic>Data Interpretation, Statistical</topic><topic>Sequence Alignment - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sheetlin, Sergey</creatorcontrib><creatorcontrib>Park, Yonil</creatorcontrib><creatorcontrib>Spouge, John L.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sheetlin, Sergey</au><au>Park, Yonil</au><au>Spouge, John L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2005-01-01</date><risdate>2005</risdate><volume>33</volume><issue>15</issue><spage>4987</spage><epage>4994</epage><pages>4987-4994</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>16147981</pmid><doi>10.1093/nar/gki800</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0305-1048 |
ispartof | Nucleic acids research, 2005-01, Vol.33 (15), p.4987-4994 |
issn | 0305-1048 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1199557 |
source | Oxford Journals Open Access Collection; MEDLINE; DOAJ Directory of Open Access Journals; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Computational Biology - methods Computer Simulation Data Interpretation, Statistical Sequence Alignment - methods Software |
title | The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T09%3A52%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Gumbel%20pre-factor%20k%20for%20gapped%20local%20alignment%20can%20be%20estimated%20from%20simulations%20of%20global%20alignment&rft.jtitle=Nucleic%20acids%20research&rft.au=Sheetlin,%20Sergey&rft.date=2005-01-01&rft.volume=33&rft.issue=15&rft.spage=4987&rft.epage=4994&rft.pages=4987-4994&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gki800&rft_dat=%3Cproquest_pubme%3E899990311%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200677721&rft_id=info:pmid/16147981&rfr_iscdi=true |