ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nucleic acids research 2001-04, Vol.29 (7), p.1647-1652
1. Verfasser:	Rognes, T
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computational Biology - methods Databases, Factual Information Storage and Retrieval ParAlign Sensitivity and Specificity Sequence Alignment - methods Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1652
container_issue	7
container_start_page	1647
container_title	Nucleic acids research
container_volume	29
creator	Rognes, T
description	There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/
doi_str_mv	10.1093/nar/29.7.1647
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_31274</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>374089931</sourcerecordid><originalsourceid>FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</originalsourceid><addsrcrecordid>eNqFkcFrFTEQxoNYbK0evcriwdu-ZpJsshEvpWgtFPSgJw9hNjv7Xspu9pnsK_S_bx592OqlEMiQ-c3HfPkYewd8BdzKs4jpTNiVWYFW5gU7AalFrawWL5_Ux-x1zjecg4JGvWLHAELrRtsT9vsHpvMxrOOnCqstJhxHGqtMf3YUPVW4b00Ul1Kt5xSWzVQNc6oSbkNfYewLGnNYwi1VPS7YYabyhMlvKL9hRwOOmd4e7lP26-uXnxff6uvvl1cX59e1V9IutYUO_TCAEQhti20vFCAa0GYA27TEyUg_9BKoa3qJZAm9aUA0YFTntZGn7POD7nbXTdT7sm7x4bYpTJju3IzB_duJYePW862TIIwq4x8P42kutvPippA9jSNGmnfZGcN5w614FoQWdFNOAT_8B97MuxTLHzhRtIzgaq9WP0A-zTknGv4uDNzto3UlWiesM24fbeHfP3X5SB-ylPc47qFE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200572042</pqid></control><display><type>article</type><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Rognes, T</creator><creatorcontrib>Rognes, T</creatorcontrib><description>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</description><identifier>ISSN: 1362-4962</identifier><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/29.7.1647</identifier><identifier>PMID: 11266569</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford Publishing Limited (England)</publisher><subject>Algorithms ; Computational Biology - methods ; Databases, Factual ; Information Storage and Retrieval ; ParAlign ; Sensitivity and Specificity ; Sequence Alignment - methods ; Software</subject><ispartof>Nucleic acids research, 2001-04, Vol.29 (7), p.1647-1652</ispartof><rights>Copyright Oxford University Press(England) Apr 1, 2001</rights><rights>Copyright © 2001 Oxford University Press 2001</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC31274/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC31274/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,725,778,782,883,27907,27908,53774,53776</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/11266569$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rognes, T</creatorcontrib><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</description><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>Databases, Factual</subject><subject>Information Storage and Retrieval</subject><subject>ParAlign</subject><subject>Sensitivity and Specificity</subject><subject>Sequence Alignment - methods</subject><subject>Software</subject><issn>1362-4962</issn><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2001</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkcFrFTEQxoNYbK0evcriwdu-ZpJsshEvpWgtFPSgJw9hNjv7Xspu9pnsK_S_bx592OqlEMiQ-c3HfPkYewd8BdzKs4jpTNiVWYFW5gU7AalFrawWL5_Ux-x1zjecg4JGvWLHAELrRtsT9vsHpvMxrOOnCqstJhxHGqtMf3YUPVW4b00Ul1Kt5xSWzVQNc6oSbkNfYewLGnNYwi1VPS7YYabyhMlvKL9hRwOOmd4e7lP26-uXnxff6uvvl1cX59e1V9IutYUO_TCAEQhti20vFCAa0GYA27TEyUg_9BKoa3qJZAm9aUA0YFTntZGn7POD7nbXTdT7sm7x4bYpTJju3IzB_duJYePW862TIIwq4x8P42kutvPippA9jSNGmnfZGcN5w614FoQWdFNOAT_8B97MuxTLHzhRtIzgaq9WP0A-zTknGv4uDNzto3UlWiesM24fbeHfP3X5SB-ylPc47qFE</recordid><startdate>20010401</startdate><enddate>20010401</enddate><creator>Rognes, T</creator><general>Oxford Publishing Limited (England)</general><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20010401</creationdate><title>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</title><author>Rognes, T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c439t-91bacff172a188a8d241aa7167f1958e0e73cfd31eb5d3ae9eac75125174bc673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2001</creationdate><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>Databases, Factual</topic><topic>Information Storage and Retrieval</topic><topic>ParAlign</topic><topic>Sensitivity and Specificity</topic><topic>Sequence Alignment - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rognes, T</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rognes, T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2001-04-01</date><risdate>2001</risdate><volume>29</volume><issue>7</issue><spage>1647</spage><epage>1652</epage><pages>1647-1652</pages><issn>1362-4962</issn><issn>0305-1048</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/</abstract><cop>England</cop><pub>Oxford Publishing Limited (England)</pub><pmid>11266569</pmid><doi>10.1093/nar/29.7.1647</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1362-4962
ispartof	Nucleic acids research, 2001-04, Vol.29 (7), p.1647-1652
issn	1362-4962 0305-1048 1362-4962
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_31274
source	MEDLINE; Oxford Journals Open Access Collection; PubMed Central; Free Full-Text Journals in Chemistry
subjects	Algorithms Computational Biology - methods Databases, Factual Information Storage and Retrieval ParAlign Sensitivity and Specificity Sequence Alignment - methods Software
title	ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T03%3A05%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ParAlign:%20a%20parallel%20sequence%20alignment%20algorithm%20for%20rapid%20and%20sensitive%20database%20searches&rft.jtitle=Nucleic%20acids%20research&rft.au=Rognes,%20T&rft.date=2001-04-01&rft.volume=29&rft.issue=7&rft.spage=1647&rft.epage=1652&rft.pages=1647-1652&rft.issn=1362-4962&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/29.7.1647&rft_dat=%3Cproquest_pubme%3E374089931%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200572042&rft_id=info:pmid/11266569&rfr_iscdi=true