ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2001-04, Vol.29 (7), p.1647-1652
1. Verfasser: Rognes, T
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1652
container_issue 7
container_start_page 1647
container_title Nucleic acids research
container_volume 29
creator Rognes, T
description There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/
doi_str_mv 10.1093/nar/29.7.1647
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_31274</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>374089931</sourcerecordid><originalsourceid>FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</originalsourceid><addsrcrecordid>eNqFkcFrFTEQxoNYbK0evcriwdu-ZpJsshEvpWgtFPSgJw9hNjv7Xspu9pnsK_S_bx592OqlEMiQ-c3HfPkYewd8BdzKs4jpTNiVWYFW5gU7AalFrawWL5_Ux-x1zjecg4JGvWLHAELrRtsT9vsHpvMxrOOnCqstJhxHGqtMf3YUPVW4b00Ul1Kt5xSWzVQNc6oSbkNfYewLGnNYwi1VPS7YYabyhMlvKL9hRwOOmd4e7lP26-uXnxff6uvvl1cX59e1V9IutYUO_TCAEQhti20vFCAa0GYA27TEyUg_9BKoa3qJZAm9aUA0YFTntZGn7POD7nbXTdT7sm7x4bYpTJju3IzB_duJYePW862TIIwq4x8P42kutvPippA9jSNGmnfZGcN5w614FoQWdFNOAT_8B97MuxTLHzhRtIzgaq9WP0A-zTknGv4uDNzto3UlWiesM24fbeHfP3X5SB-ylPc47qFE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200572042</pqid></control><display><type>article</type><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Rognes, T</creator><creatorcontrib>Rognes, T</creatorcontrib><description>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</description><identifier>ISSN: 1362-4962</identifier><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/29.7.1647</identifier><identifier>PMID: 11266569</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford Publishing Limited (England)</publisher><subject>Algorithms ; Computational Biology - methods ; Databases, Factual ; Information Storage and Retrieval ; ParAlign ; Sensitivity and Specificity ; Sequence Alignment - methods ; Software</subject><ispartof>Nucleic acids research, 2001-04, Vol.29 (7), p.1647-1652</ispartof><rights>Copyright Oxford University Press(England) Apr 1, 2001</rights><rights>Copyright © 2001 Oxford University Press 2001</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC31274/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC31274/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,725,778,782,883,27907,27908,53774,53776</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/11266569$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rognes, T</creatorcontrib><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</description><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>Databases, Factual</subject><subject>Information Storage and Retrieval</subject><subject>ParAlign</subject><subject>Sensitivity and Specificity</subject><subject>Sequence Alignment - methods</subject><subject>Software</subject><issn>1362-4962</issn><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2001</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkcFrFTEQxoNYbK0evcriwdu-ZpJsshEvpWgtFPSgJw9hNjv7Xspu9pnsK_S_bx592OqlEMiQ-c3HfPkYewd8BdzKs4jpTNiVWYFW5gU7AalFrawWL5_Ux-x1zjecg4JGvWLHAELrRtsT9vsHpvMxrOOnCqstJhxHGqtMf3YUPVW4b00Ul1Kt5xSWzVQNc6oSbkNfYewLGnNYwi1VPS7YYabyhMlvKL9hRwOOmd4e7lP26-uXnxff6uvvl1cX59e1V9IutYUO_TCAEQhti20vFCAa0GYA27TEyUg_9BKoa3qJZAm9aUA0YFTntZGn7POD7nbXTdT7sm7x4bYpTJju3IzB_duJYePW862TIIwq4x8P42kutvPippA9jSNGmnfZGcN5w614FoQWdFNOAT_8B97MuxTLHzhRtIzgaq9WP0A-zTknGv4uDNzto3UlWiesM24fbeHfP3X5SB-ylPc47qFE</recordid><startdate>20010401</startdate><enddate>20010401</enddate><creator>Rognes, T</creator><general>Oxford Publishing Limited (England)</general><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20010401</creationdate><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><author>Rognes, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>Databases, Factual</topic><topic>Information Storage and Retrieval</topic><topic>ParAlign</topic><topic>Sensitivity and Specificity</topic><topic>Sequence Alignment - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rognes, T</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rognes, T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2001-04-01</date><risdate>2001</risdate><volume>29</volume><issue>7</issue><spage>1647</spage><epage>1652</epage><pages>1647-1652</pages><issn>1362-4962</issn><issn>0305-1048</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</abstract><cop>England</cop><pub>Oxford Publishing Limited (England)</pub><pmid>11266569</pmid><doi>10.1093/nar/29.7.1647</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1362-4962
ispartof Nucleic acids research, 2001-04, Vol.29 (7), p.1647-1652
issn 1362-4962
0305-1048
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_31274
source MEDLINE; Oxford Journals Open Access Collection; PubMed Central; Free Full-Text Journals in Chemistry
subjects Algorithms
Computational Biology - methods
Databases, Factual
Information Storage and Retrieval
ParAlign
Sensitivity and Specificity
Sequence Alignment - methods
Software
title ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A05%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ParAlign:%20a%20parallel%20sequence%20alignment%20algorithm%20for%20rapid%20and%20sensitive%20database%20searches&rft.jtitle=Nucleic%20acids%20research&rft.au=Rognes,%20T&rft.date=2001-04-01&rft.volume=29&rft.issue=7&rft.spage=1647&rft.epage=1652&rft.pages=1647-1652&rft.issn=1362-4962&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/29.7.1647&rft_dat=%3Cproquest_pubme%3E374089931%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200572042&rft_id=info:pmid/11266569&rfr_iscdi=true