ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches
There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common...
Gespeichert in:
Veröffentlicht in: | Nucleic acids research 2001-04, Vol.29 (7), p.1647-1652 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1652 |
---|---|
container_issue | 7 |
container_start_page | 1647 |
container_title | Nucleic acids research |
container_volume | 29 |
creator | Rognes, T |
description | There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/ |
doi_str_mv | 10.1093/nar/29.7.1647 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_31274</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>374089931</sourcerecordid><originalsourceid>FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</originalsourceid><addsrcrecordid>eNqFkcFrFTEQxoNYbK0evcriwdu-ZpJsshEvpWgtFPSgJw9hNjv7Xspu9pnsK_S_bx592OqlEMiQ-c3HfPkYewd8BdzKs4jpTNiVWYFW5gU7AalFrawWL5_Ux-x1zjecg4JGvWLHAELrRtsT9vsHpvMxrOOnCqstJhxHGqtMf3YUPVW4b00Ul1Kt5xSWzVQNc6oSbkNfYewLGnNYwi1VPS7YYabyhMlvKL9hRwOOmd4e7lP26-uXnxff6uvvl1cX59e1V9IutYUO_TCAEQhti20vFCAa0GYA27TEyUg_9BKoa3qJZAm9aUA0YFTntZGn7POD7nbXTdT7sm7x4bYpTJju3IzB_duJYePW862TIIwq4x8P42kutvPippA9jSNGmnfZGcN5w614FoQWdFNOAT_8B97MuxTLHzhRtIzgaq9WP0A-zTknGv4uDNzto3UlWiesM24fbeHfP3X5SB-ylPc47qFE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200572042</pqid></control><display><type>article</type><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Rognes, T</creator><creatorcontrib>Rognes, T</creatorcontrib><description>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</description><identifier>ISSN: 1362-4962</identifier><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/29.7.1647</identifier><identifier>PMID: 11266569</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford Publishing Limited (England)</publisher><subject>Algorithms ; Computational Biology - methods ; Databases, Factual ; Information Storage and Retrieval ; ParAlign ; Sensitivity and Specificity ; Sequence Alignment - methods ; Software</subject><ispartof>Nucleic acids research, 2001-04, Vol.29 (7), p.1647-1652</ispartof><rights>Copyright Oxford University Press(England) Apr 1, 2001</rights><rights>Copyright © 2001 Oxford University Press 2001</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC31274/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC31274/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,725,778,782,883,27907,27908,53774,53776</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/11266569$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rognes, T</creatorcontrib><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</description><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>Databases, Factual</subject><subject>Information Storage and Retrieval</subject><subject>ParAlign</subject><subject>Sensitivity and Specificity</subject><subject>Sequence Alignment - methods</subject><subject>Software</subject><issn>1362-4962</issn><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2001</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkcFrFTEQxoNYbK0evcriwdu-ZpJsshEvpWgtFPSgJw9hNjv7Xspu9pnsK_S_bx592OqlEMiQ-c3HfPkYewd8BdzKs4jpTNiVWYFW5gU7AalFrawWL5_Ux-x1zjecg4JGvWLHAELrRtsT9vsHpvMxrOOnCqstJhxHGqtMf3YUPVW4b00Ul1Kt5xSWzVQNc6oSbkNfYewLGnNYwi1VPS7YYabyhMlvKL9hRwOOmd4e7lP26-uXnxff6uvvl1cX59e1V9IutYUO_TCAEQhti20vFCAa0GYA27TEyUg_9BKoa3qJZAm9aUA0YFTntZGn7POD7nbXTdT7sm7x4bYpTJju3IzB_duJYePW862TIIwq4x8P42kutvPippA9jSNGmnfZGcN5w614FoQWdFNOAT_8B97MuxTLHzhRtIzgaq9WP0A-zTknGv4uDNzto3UlWiesM24fbeHfP3X5SB-ylPc47qFE</recordid><startdate>20010401</startdate><enddate>20010401</enddate><creator>Rognes, T</creator><general>Oxford Publishing Limited (England)</general><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20010401</creationdate><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><author>Rognes, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>Databases, Factual</topic><topic>Information Storage and Retrieval</topic><topic>ParAlign</topic><topic>Sensitivity and Specificity</topic><topic>Sequence Alignment - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rognes, T</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rognes, T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2001-04-01</date><risdate>2001</risdate><volume>29</volume><issue>7</issue><spage>1647</spage><epage>1652</epage><pages>1647-1652</pages><issn>1362-4962</issn><issn>0305-1048</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</abstract><cop>England</cop><pub>Oxford Publishing Limited (England)</pub><pmid>11266569</pmid><doi>10.1093/nar/29.7.1647</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1362-4962 |
ispartof | Nucleic acids research, 2001-04, Vol.29 (7), p.1647-1652 |
issn | 1362-4962 0305-1048 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_31274 |
source | MEDLINE; Oxford Journals Open Access Collection; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Algorithms Computational Biology - methods Databases, Factual Information Storage and Retrieval ParAlign Sensitivity and Specificity Sequence Alignment - methods Software |
title | ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A05%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ParAlign:%20a%20parallel%20sequence%20alignment%20algorithm%20for%20rapid%20and%20sensitive%20database%20searches&rft.jtitle=Nucleic%20acids%20research&rft.au=Rognes,%20T&rft.date=2001-04-01&rft.volume=29&rft.issue=7&rft.spage=1647&rft.epage=1652&rft.pages=1647-1652&rft.issn=1362-4962&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/29.7.1647&rft_dat=%3Cproquest_pubme%3E374089931%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200572042&rft_id=info:pmid/11266569&rfr_iscdi=true |