An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence

Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA (ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment algorithms becomes higher when the structure has pseudo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2012-11, Vol.9 (6), p.1629-1638
Hauptverfasser: Ma, C., Wong, T. K. F., Lam, T. W., Hon, W. K., Sadakane, K., Yiu, S. M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1638
container_issue 6
container_start_page 1629
container_title IEEE/ACM transactions on computational biology and bioinformatics
container_volume 9
creator Ma, C.
Wong, T. K. F.
Lam, T. W.
Hon, W. K.
Sadakane, K.
Yiu, S. M.
description Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA (ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment algorithms becomes higher when the structure has pseudoknots. Even for the simplest type of pseudoknots (simple pseudoknots), the fastest algorithm runs in O(mn 3 ) time, where m, n are the length of the query ncRNA (with known structure) and the length of the target sequence (with unknown structure), respectively. In practice, we are usually given a long DNA sequence and we try to locate regions in the sequence for possible candidates of a particular ncRNA. Thus, we need to run the structural alignment algorithm on every possible region in the long sequence. For example, finding candidates for a known ncRNA of length 100 on a sequence of length 50,000, it takes more than one day. In this paper, we provide an efficient algorithm to solve the problem for simple pseudoknots and it is shown to be 10 times faster. The speedup stems from an effective pruning strategy consisting of the computation of a lower bound score for the optimal alignment and an estimation of the maximum score that a candidate can achieve to decide whether to prune the current candidate or not.
doi_str_mv 10.1109/TCBB.2012.104
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1318096718</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6256660</ieee_id><sourcerecordid>2838383061</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-aced5b151eaece3fbeac2ab5cd0534d2708fc0401ffa361417e10ce9e21c58e53</originalsourceid><addsrcrecordid>eNqF0c9rFDEUB_Agiq3VoydBBrx4mfXl9-S4XWoVFhRab8KQzbxsU2eSNZkR_O_NsrUHL57yyPvw8sKXkNcUVpSC-XC7ubxcMaBsRUE8IedUSt0ao8TTYy1kK43iZ-RFKfcATBgQz8kZY53oKBfn5Ps6NlfeBxcwzs16DPs4nap9ymG-mxqfcnODNru7EPfNTZgOIzZfCy5D-hHTXJr0C3OzTbV5jTFNwVX-c8Ho8CV55u1Y8NXDeUG-fby63Xxqt1-uP2_W29Zxo-bWOhzkjkqKFh1yv0PrmN1JN4DkYmAaOu9AAPXeckUF1UjBoUFGnexQ8gvy_jT3kFN9ucz9FIrDcbQR01J6ymkHRmna_Z8yqbU2HHSl7_6h92nJsX6kKq4BRN2kqvakXE6lZPT9IYfJ5t89hf6YUH9MqD8mVG9E9W8fpi67CYdH_TeSCt6cQEDEx7ZiUikF_A9kAZPy</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1237004417</pqid></control><display><type>article</type><title>An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence</title><source>IEEE Electronic Library (IEL)</source><creator>Ma, C. ; Wong, T. K. F. ; Lam, T. W. ; Hon, W. K. ; Sadakane, K. ; Yiu, S. M.</creator><creatorcontrib>Ma, C. ; Wong, T. K. F. ; Lam, T. W. ; Hon, W. K. ; Sadakane, K. ; Yiu, S. M.</creatorcontrib><description>Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA (ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment algorithms becomes higher when the structure has pseudoknots. Even for the simplest type of pseudoknots (simple pseudoknots), the fastest algorithm runs in O(mn 3 ) time, where m, n are the length of the query ncRNA (with known structure) and the length of the target sequence (with unknown structure), respectively. In practice, we are usually given a long DNA sequence and we try to locate regions in the sequence for possible candidates of a particular ncRNA. Thus, we need to run the structural alignment algorithm on every possible region in the long sequence. For example, finding candidates for a known ncRNA of length 100 on a sequence of length 50,000, it takes more than one day. In this paper, we provide an efficient algorithm to solve the problem for simple pseudoknots and it is shown to be 10 times faster. The speedup stems from an effective pruning strategy consisting of the computation of a lower bound score for the optimal alignment and an estimation of the maximum score that a candidate can achieve to decide whether to prune the current candidate or not.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2012.104</identifier><identifier>PMID: 22848134</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithm design and analysis ; Algorithms ; Bioinformatics ; Candidates ; Complexity theory ; Computational biology ; Computational Biology - methods ; DNA - chemistry ; DNA - genetics ; Genome ; Genomics ; Heuristic algorithms ; Models, Genetic ; Noncoding RNAs ; Nucleic Acid Conformation ; pseudoknot ; RNA ; RNA, Untranslated - chemistry ; RNA, Untranslated - genetics ; Sequence Analysis, DNA - methods ; Software ; structural alignment ; Studies</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2012-11, Vol.9 (6), p.1629-1638</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Nov/Dec 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c396t-aced5b151eaece3fbeac2ab5cd0534d2708fc0401ffa361417e10ce9e21c58e53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6256660$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6256660$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22848134$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, C.</creatorcontrib><creatorcontrib>Wong, T. K. F.</creatorcontrib><creatorcontrib>Lam, T. W.</creatorcontrib><creatorcontrib>Hon, W. K.</creatorcontrib><creatorcontrib>Sadakane, K.</creatorcontrib><creatorcontrib>Yiu, S. M.</creatorcontrib><title>An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA (ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment algorithms becomes higher when the structure has pseudoknots. Even for the simplest type of pseudoknots (simple pseudoknots), the fastest algorithm runs in O(mn 3 ) time, where m, n are the length of the query ncRNA (with known structure) and the length of the target sequence (with unknown structure), respectively. In practice, we are usually given a long DNA sequence and we try to locate regions in the sequence for possible candidates of a particular ncRNA. Thus, we need to run the structural alignment algorithm on every possible region in the long sequence. For example, finding candidates for a known ncRNA of length 100 on a sequence of length 50,000, it takes more than one day. In this paper, we provide an efficient algorithm to solve the problem for simple pseudoknots and it is shown to be 10 times faster. The speedup stems from an effective pruning strategy consisting of the computation of a lower bound score for the optimal alignment and an estimation of the maximum score that a candidate can achieve to decide whether to prune the current candidate or not.</description><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Candidates</subject><subject>Complexity theory</subject><subject>Computational biology</subject><subject>Computational Biology - methods</subject><subject>DNA - chemistry</subject><subject>DNA - genetics</subject><subject>Genome</subject><subject>Genomics</subject><subject>Heuristic algorithms</subject><subject>Models, Genetic</subject><subject>Noncoding RNAs</subject><subject>Nucleic Acid Conformation</subject><subject>pseudoknot</subject><subject>RNA</subject><subject>RNA, Untranslated - chemistry</subject><subject>RNA, Untranslated - genetics</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Software</subject><subject>structural alignment</subject><subject>Studies</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNqF0c9rFDEUB_Agiq3VoydBBrx4mfXl9-S4XWoVFhRab8KQzbxsU2eSNZkR_O_NsrUHL57yyPvw8sKXkNcUVpSC-XC7ubxcMaBsRUE8IedUSt0ao8TTYy1kK43iZ-RFKfcATBgQz8kZY53oKBfn5Ps6NlfeBxcwzs16DPs4nap9ymG-mxqfcnODNru7EPfNTZgOIzZfCy5D-hHTXJr0C3OzTbV5jTFNwVX-c8Ho8CV55u1Y8NXDeUG-fby63Xxqt1-uP2_W29Zxo-bWOhzkjkqKFh1yv0PrmN1JN4DkYmAaOu9AAPXeckUF1UjBoUFGnexQ8gvy_jT3kFN9ucz9FIrDcbQR01J6ymkHRmna_Z8yqbU2HHSl7_6h92nJsX6kKq4BRN2kqvakXE6lZPT9IYfJ5t89hf6YUH9MqD8mVG9E9W8fpi67CYdH_TeSCt6cQEDEx7ZiUikF_A9kAZPy</recordid><startdate>20121101</startdate><enddate>20121101</enddate><creator>Ma, C.</creator><creator>Wong, T. K. F.</creator><creator>Lam, T. W.</creator><creator>Hon, W. K.</creator><creator>Sadakane, K.</creator><creator>Yiu, S. M.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7TM</scope><scope>7X8</scope></search><sort><creationdate>20121101</creationdate><title>An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence</title><author>Ma, C. ; Wong, T. K. F. ; Lam, T. W. ; Hon, W. K. ; Sadakane, K. ; Yiu, S. M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-aced5b151eaece3fbeac2ab5cd0534d2708fc0401ffa361417e10ce9e21c58e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Candidates</topic><topic>Complexity theory</topic><topic>Computational biology</topic><topic>Computational Biology - methods</topic><topic>DNA - chemistry</topic><topic>DNA - genetics</topic><topic>Genome</topic><topic>Genomics</topic><topic>Heuristic algorithms</topic><topic>Models, Genetic</topic><topic>Noncoding RNAs</topic><topic>Nucleic Acid Conformation</topic><topic>pseudoknot</topic><topic>RNA</topic><topic>RNA, Untranslated - chemistry</topic><topic>RNA, Untranslated - genetics</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Software</topic><topic>structural alignment</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, C.</creatorcontrib><creatorcontrib>Wong, T. K. F.</creatorcontrib><creatorcontrib>Lam, T. W.</creatorcontrib><creatorcontrib>Hon, W. K.</creatorcontrib><creatorcontrib>Sadakane, K.</creatorcontrib><creatorcontrib>Yiu, S. M.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ma, C.</au><au>Wong, T. K. F.</au><au>Lam, T. W.</au><au>Hon, W. K.</au><au>Sadakane, K.</au><au>Yiu, S. M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2012-11-01</date><risdate>2012</risdate><volume>9</volume><issue>6</issue><spage>1629</spage><epage>1638</epage><pages>1629-1638</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA (ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment algorithms becomes higher when the structure has pseudoknots. Even for the simplest type of pseudoknots (simple pseudoknots), the fastest algorithm runs in O(mn 3 ) time, where m, n are the length of the query ncRNA (with known structure) and the length of the target sequence (with unknown structure), respectively. In practice, we are usually given a long DNA sequence and we try to locate regions in the sequence for possible candidates of a particular ncRNA. Thus, we need to run the structural alignment algorithm on every possible region in the long sequence. For example, finding candidates for a known ncRNA of length 100 on a sequence of length 50,000, it takes more than one day. In this paper, we provide an efficient algorithm to solve the problem for simple pseudoknots and it is shown to be 10 times faster. The speedup stems from an effective pruning strategy consisting of the computation of a lower bound score for the optimal alignment and an estimation of the maximum score that a candidate can achieve to decide whether to prune the current candidate or not.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>22848134</pmid><doi>10.1109/TCBB.2012.104</doi><tpages>10</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-5963
ispartof IEEE/ACM transactions on computational biology and bioinformatics, 2012-11, Vol.9 (6), p.1629-1638
issn 1545-5963
1557-9964
language eng
recordid cdi_proquest_miscellaneous_1318096718
source IEEE Electronic Library (IEL)
subjects Algorithm design and analysis
Algorithms
Bioinformatics
Candidates
Complexity theory
Computational biology
Computational Biology - methods
DNA - chemistry
DNA - genetics
Genome
Genomics
Heuristic algorithms
Models, Genetic
Noncoding RNAs
Nucleic Acid Conformation
pseudoknot
RNA
RNA, Untranslated - chemistry
RNA, Untranslated - genetics
Sequence Analysis, DNA - methods
Software
structural alignment
Studies
title An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T03%3A03%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Efficient%20Alignment%20Algorithm%20for%20Searching%20Simple%20Pseudoknots%20over%20Long%20Genomic%20Sequence&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Ma,%20C.&rft.date=2012-11-01&rft.volume=9&rft.issue=6&rft.spage=1629&rft.epage=1638&rft.pages=1629-1638&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2012.104&rft_dat=%3Cproquest_RIE%3E2838383061%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1237004417&rft_id=info:pmid/22848134&rft_ieee_id=6256660&rfr_iscdi=true