Biclustering as a method for RNA local multiple sequence alignment

Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2007-12, Vol.23 (24), p.3289-3296
Hauptverfasser:	Wang, Shu, Gutell, Robin R., Miranker, Daniel P.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Base Sequence Biological and medical sciences Cluster Analysis Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Pattern Recognition, Automated - methods Sequence Alignment - methods Sequence Analysis, RNA - methods Sequence Homology, Nucleic Acid
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3296
container_issue	24
container_start_page	3289
container_title	Bioinformatics
container_volume	23
creator	Wang, Shu Gutell, Robin R. Miranker, Daniel P.
description	Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. Results: We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. Availability: BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/ Contact: shuwang2006@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
doi_str_mv	10.1093/bioinformatics/btm485
format	Article
fullrecord	<record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2228335</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btm485</oup_id><sourcerecordid>19678654</sourcerecordid><originalsourceid>FETCH-LOGICAL-c576t-201c092075fec61342ca2cae7145c80784052d5dc5a520c4accd3b584a0acc3d3</originalsourceid><addsrcrecordid>eNqNkltrFTEUhYNY7EV_ghIE-zY210nyIrSltdZWQY4ovoScTOY0NZMck5mi_96UObTWF4VAAvnW2ntnBYDnGL3GSNGDpU8-9ikPZvS2HCzHgUn-COxg1qKGIK4e1zNtRcMkottgt5RrhDhmjD0B21gogpliO-DoyNswldFlH1fQFGjg4Mar1MHqDT99OIQhWRPgMIXRr4ODxf2YXLQOmuBXcXBxfAq2ehOKe7bZ98Dn05PF8Vlz8fHtu-PDi8Zy0Y61J2yRIkjw3tkWU0asqcsJzLiVSEiGOOl4Z7nhBFlmrO3okktmUD3Sju6BN7PveloOrrO1dDZBr7MfTP6lk_H64U30V3qVbjQhRFLKq8H-xiCnOkQZ9eCLdSGY6NJUdKsQEwT_G8SqFbLlrIIv_wKv05RjfYXKyApxcevGZ8jmVEp2_V3LGOnbLPXDLPWcZdW9-HPee9UmvAq82gCm1Iz6bKL15Z5TCrVKksqhmUvT-r9rN7PE16_x805k8nfdCiq4Pvv6TZ_zxZfL08uFfk9_A7O2zgs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198678575</pqid></control><display><type>article</type><title>Biclustering as a method for RNA local multiple sequence alignment</title><source>Oxford Journals Open Access Collection</source><creator>Wang, Shu ; Gutell, Robin R. ; Miranker, Daniel P.</creator><creatorcontrib>Wang, Shu ; Gutell, Robin R. ; Miranker, Daniel P.</creatorcontrib><description>Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. Results: We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. Availability: BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/ Contact: shuwang2006@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btm485</identifier><identifier>PMID: 17921494</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Artificial Intelligence ; Base Sequence ; Biological and medical sciences ; Cluster Analysis ; Fundamental and applied biological sciences. Psychology ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Molecular Sequence Data ; Pattern Recognition, Automated - methods ; Sequence Alignment - methods ; Sequence Analysis, RNA - methods ; Sequence Homology, Nucleic Acid</subject><ispartof>Bioinformatics, 2007-12, Vol.23 (24), p.3289-3296</ispartof><rights>2007 The Author(s) 2007</rights><rights>2008 INIST-CNRS</rights><rights>2007 The Author(s)</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c576t-201c092075fec61342ca2cae7145c80784052d5dc5a520c4accd3b584a0acc3d3</citedby><cites>FETCH-LOGICAL-c576t-201c092075fec61342ca2cae7145c80784052d5dc5a520c4accd3b584a0acc3d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2228335/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2228335/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,1604,27924,27925,53791,53793</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btm485$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=19906982$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/17921494$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Shu</creatorcontrib><creatorcontrib>Gutell, Robin R.</creatorcontrib><creatorcontrib>Miranker, Daniel P.</creatorcontrib><title>Biclustering as a method for RNA local multiple sequence alignment</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. Results: We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. Availability: BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/ Contact: shuwang2006@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Base Sequence</subject><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Molecular Sequence Data</subject><subject>Pattern Recognition, Automated - methods</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, RNA - methods</subject><subject>Sequence Homology, Nucleic Acid</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkltrFTEUhYNY7EV_ghIE-zY210nyIrSltdZWQY4ovoScTOY0NZMck5mi_96UObTWF4VAAvnW2ntnBYDnGL3GSNGDpU8-9ikPZvS2HCzHgUn-COxg1qKGIK4e1zNtRcMkottgt5RrhDhmjD0B21gogpliO-DoyNswldFlH1fQFGjg4Mar1MHqDT99OIQhWRPgMIXRr4ODxf2YXLQOmuBXcXBxfAq2ehOKe7bZ98Dn05PF8Vlz8fHtu-PDi8Zy0Y61J2yRIkjw3tkWU0asqcsJzLiVSEiGOOl4Z7nhBFlmrO3okktmUD3Sju6BN7PveloOrrO1dDZBr7MfTP6lk_H64U30V3qVbjQhRFLKq8H-xiCnOkQZ9eCLdSGY6NJUdKsQEwT_G8SqFbLlrIIv_wKv05RjfYXKyApxcevGZ8jmVEp2_V3LGOnbLPXDLPWcZdW9-HPee9UmvAq82gCm1Iz6bKL15Z5TCrVKksqhmUvT-r9rN7PE16_x805k8nfdCiq4Pvv6TZ_zxZfL08uFfk9_A7O2zgs</recordid><startdate>20071215</startdate><enddate>20071215</enddate><creator>Wang, Shu</creator><creator>Gutell, Robin R.</creator><creator>Miranker, Daniel P.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20071215</creationdate><title>Biclustering as a method for RNA local multiple sequence alignment</title><author>Wang, Shu ; Gutell, Robin R. ; Miranker, Daniel P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c576t-201c092075fec61342ca2cae7145c80784052d5dc5a520c4accd3b584a0acc3d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Base Sequence</topic><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Molecular Sequence Data</topic><topic>Pattern Recognition, Automated - methods</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, RNA - methods</topic><topic>Sequence Homology, Nucleic Acid</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Shu</creatorcontrib><creatorcontrib>Gutell, Robin R.</creatorcontrib><creatorcontrib>Miranker, Daniel P.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Shu</au><au>Gutell, Robin R.</au><au>Miranker, Daniel P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Biclustering as a method for RNA local multiple sequence alignment</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2007-12-15</date><risdate>2007</risdate><volume>23</volume><issue>24</issue><spage>3289</spage><epage>3296</epage><pages>3289-3296</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivations: Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. Results: We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. Availability: BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/ Contact: shuwang2006@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>17921494</pmid><doi>10.1093/bioinformatics/btm485</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1367-4803
ispartof	Bioinformatics, 2007-12, Vol.23 (24), p.3289-3296
issn	1367-4803 1460-2059 1367-4811
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2228335
source	Oxford Journals Open Access Collection
subjects	Algorithms Artificial Intelligence Base Sequence Biological and medical sciences Cluster Analysis Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Molecular Sequence Data Pattern Recognition, Automated - methods Sequence Alignment - methods Sequence Analysis, RNA - methods Sequence Homology, Nucleic Acid
title	Biclustering as a method for RNA local multiple sequence alignment
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T07%3A41%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Biclustering%20as%20a%20method%20for%20RNA%20local%20multiple%20sequence%20alignment&rft.jtitle=Bioinformatics&rft.au=Wang,%20Shu&rft.date=2007-12-15&rft.volume=23&rft.issue=24&rft.spage=3289&rft.epage=3296&rft.pages=3289-3296&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btm485&rft_dat=%3Cproquest_TOX%3E19678654%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198678575&rft_id=info:pmid/17921494&rft_oup_id=10.1093/bioinformatics/btm485&rfr_iscdi=true