A systematic comparison and evaluation of biclustering methods for gene expression data

Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify set...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2006-05, Vol.22 (9), p.1122-1129
Hauptverfasser: Prelić, Amela, Bleuler, Stefan, Zimmermann, Philip, Wille, Anja, Bühlmann, Peter, Gruissem, Wilhelm, Hennig, Lars, Thiele, Lothar, Zitzler, Eckart
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1129
container_issue 9
container_start_page 1122
container_title Bioinformatics
container_volume 22
creator Prelić, Amela
Bleuler, Stefan
Zimmermann, Philip
Wille, Anja
Bühlmann, Peter
Gruissem, Wilhelm
Hennig, Lars
Thiele, Lothar
Zitzler, Eckart
description Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. Availability: The datasets used, the outcomes of the biclustering algorithms and the Bimax implementation for the reference model are available at Contact:bleuler@tik.ee.ethz.ch Supplementary information: Supplementary data are available at
doi_str_mv 10.1093/bioinformatics/btl060
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67874331</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>19993805</sourcerecordid><originalsourceid>FETCH-LOGICAL-c546t-93fdca142c170d8adbfbccf541babe3ce6ba5f047e3dd0d48162c93a8ef5c633</originalsourceid><addsrcrecordid>eNqFkV1LHDEUhkNpqR_tT7CEgt5NTTYfM7lU0VoQCrLQ0ptw8qWxM5NtMiP67826S8Xe9CoJ5zlvzuFB6ICSL5QodmxiimNIeYAp2nJspp5I8gbtUi5JsyBCva13JtuGd4TtoL1S7ggRlHP-Hu1QKQhRnO6iHye4PJbJP8dgm4YV5FjSiGF02N9DP9dCfaaATbT9XNEcxxs8-Ok2uYLrBPjGjx77h1X2paxZBxN8QO8C9MV_3J77aHlxvjy7bK6-f_12dnLVWMHl1CgWnAXKF5a2xHXgTDDWBsGpAeOZ9dKACIS3njlHHO-oXFjFoPNBWMnYPjraxK5y-jP7MukhFuv7Hkaf5qJl27WcMfpfkCqlWEdEBT__A96lOY91h8p0UhL-_K3YQDanUrIPepXjAPlRU6LXevRrPXqjp_Z92obPZvDupWvrowKHWwCKhT5kGG0sL1zbyk6p9ZTNhotVyMPfOuTfdWXWCn3585eurLi-Pl3qBXsCe9ivDw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198660463</pqid></control><display><type>article</type><title>A systematic comparison and evaluation of biclustering methods for gene expression data</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford Journals Open Access Collection</source><source>Alma/SFX Local Collection</source><creator>Prelić, Amela ; Bleuler, Stefan ; Zimmermann, Philip ; Wille, Anja ; Bühlmann, Peter ; Gruissem, Wilhelm ; Hennig, Lars ; Thiele, Lothar ; Zitzler, Eckart</creator><creatorcontrib>Prelić, Amela ; Bleuler, Stefan ; Zimmermann, Philip ; Wille, Anja ; Bühlmann, Peter ; Gruissem, Wilhelm ; Hennig, Lars ; Thiele, Lothar ; Zitzler, Eckart</creatorcontrib><description>Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. Availability: The datasets used, the outcomes of the biclustering algorithms and the Bimax implementation for the reference model are available at Contact:bleuler@tik.ee.ethz.ch Supplementary information: Supplementary data are available at</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btl060</identifier><identifier>PMID: 16500941</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Arabidopsis thaliana ; Artificial Intelligence ; Biological and medical sciences ; Cluster Analysis ; Databases, Genetic ; Fundamental and applied biological sciences. Psychology ; Gene Expression - physiology ; Gene Expression Profiling - methods ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Oligonucleotide Array Sequence Analysis - methods ; Pattern Recognition, Automated - methods ; Saccharomyces cerevisiae</subject><ispartof>Bioinformatics, 2006-05, Vol.22 (9), p.1122-1129</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright Oxford University Press(England) May 1, 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c546t-93fdca142c170d8adbfbccf541babe3ce6ba5f047e3dd0d48162c93a8ef5c633</citedby><cites>FETCH-LOGICAL-c546t-93fdca142c170d8adbfbccf541babe3ce6ba5f047e3dd0d48162c93a8ef5c633</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=17768995$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16500941$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Prelić, Amela</creatorcontrib><creatorcontrib>Bleuler, Stefan</creatorcontrib><creatorcontrib>Zimmermann, Philip</creatorcontrib><creatorcontrib>Wille, Anja</creatorcontrib><creatorcontrib>Bühlmann, Peter</creatorcontrib><creatorcontrib>Gruissem, Wilhelm</creatorcontrib><creatorcontrib>Hennig, Lars</creatorcontrib><creatorcontrib>Thiele, Lothar</creatorcontrib><creatorcontrib>Zitzler, Eckart</creatorcontrib><title>A systematic comparison and evaluation of biclustering methods for gene expression data</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. Availability: The datasets used, the outcomes of the biclustering algorithms and the Bimax implementation for the reference model are available at Contact:bleuler@tik.ee.ethz.ch Supplementary information: Supplementary data are available at</description><subject>Algorithms</subject><subject>Arabidopsis thaliana</subject><subject>Artificial Intelligence</subject><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Databases, Genetic</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression - physiology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Pattern Recognition, Automated - methods</subject><subject>Saccharomyces cerevisiae</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkV1LHDEUhkNpqR_tT7CEgt5NTTYfM7lU0VoQCrLQ0ptw8qWxM5NtMiP67826S8Xe9CoJ5zlvzuFB6ICSL5QodmxiimNIeYAp2nJspp5I8gbtUi5JsyBCva13JtuGd4TtoL1S7ggRlHP-Hu1QKQhRnO6iHye4PJbJP8dgm4YV5FjSiGF02N9DP9dCfaaATbT9XNEcxxs8-Ok2uYLrBPjGjx77h1X2paxZBxN8QO8C9MV_3J77aHlxvjy7bK6-f_12dnLVWMHl1CgWnAXKF5a2xHXgTDDWBsGpAeOZ9dKACIS3njlHHO-oXFjFoPNBWMnYPjraxK5y-jP7MukhFuv7Hkaf5qJl27WcMfpfkCqlWEdEBT__A96lOY91h8p0UhL-_K3YQDanUrIPepXjAPlRU6LXevRrPXqjp_Z92obPZvDupWvrowKHWwCKhT5kGG0sL1zbyk6p9ZTNhotVyMPfOuTfdWXWCn3585eurLi-Pl3qBXsCe9ivDw</recordid><startdate>20060501</startdate><enddate>20060501</enddate><creator>Prelić, Amela</creator><creator>Bleuler, Stefan</creator><creator>Zimmermann, Philip</creator><creator>Wille, Anja</creator><creator>Bühlmann, Peter</creator><creator>Gruissem, Wilhelm</creator><creator>Hennig, Lars</creator><creator>Thiele, Lothar</creator><creator>Zitzler, Eckart</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>M7N</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20060501</creationdate><title>A systematic comparison and evaluation of biclustering methods for gene expression data</title><author>Prelić, Amela ; Bleuler, Stefan ; Zimmermann, Philip ; Wille, Anja ; Bühlmann, Peter ; Gruissem, Wilhelm ; Hennig, Lars ; Thiele, Lothar ; Zitzler, Eckart</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c546t-93fdca142c170d8adbfbccf541babe3ce6ba5f047e3dd0d48162c93a8ef5c633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Arabidopsis thaliana</topic><topic>Artificial Intelligence</topic><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Databases, Genetic</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression - physiology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Pattern Recognition, Automated - methods</topic><topic>Saccharomyces cerevisiae</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Prelić, Amela</creatorcontrib><creatorcontrib>Bleuler, Stefan</creatorcontrib><creatorcontrib>Zimmermann, Philip</creatorcontrib><creatorcontrib>Wille, Anja</creatorcontrib><creatorcontrib>Bühlmann, Peter</creatorcontrib><creatorcontrib>Gruissem, Wilhelm</creatorcontrib><creatorcontrib>Hennig, Lars</creatorcontrib><creatorcontrib>Thiele, Lothar</creatorcontrib><creatorcontrib>Zitzler, Eckart</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Prelić, Amela</au><au>Bleuler, Stefan</au><au>Zimmermann, Philip</au><au>Wille, Anja</au><au>Bühlmann, Peter</au><au>Gruissem, Wilhelm</au><au>Hennig, Lars</au><au>Thiele, Lothar</au><au>Zitzler, Eckart</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A systematic comparison and evaluation of biclustering methods for gene expression data</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2006-05-01</date><risdate>2006</risdate><volume>22</volume><issue>9</issue><spage>1122</spage><epage>1129</epage><pages>1122-1129</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. Availability: The datasets used, the outcomes of the biclustering algorithms and the Bimax implementation for the reference model are available at Contact:bleuler@tik.ee.ethz.ch Supplementary information: Supplementary data are available at</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>16500941</pmid><doi>10.1093/bioinformatics/btl060</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2006-05, Vol.22 (9), p.1122-1129
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_67874331
source MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford Journals Open Access Collection; Alma/SFX Local Collection
subjects Algorithms
Arabidopsis thaliana
Artificial Intelligence
Biological and medical sciences
Cluster Analysis
Databases, Genetic
Fundamental and applied biological sciences. Psychology
Gene Expression - physiology
Gene Expression Profiling - methods
General aspects
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Oligonucleotide Array Sequence Analysis - methods
Pattern Recognition, Automated - methods
Saccharomyces cerevisiae
title A systematic comparison and evaluation of biclustering methods for gene expression data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T14%3A06%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20systematic%20comparison%20and%20evaluation%20of%20biclustering%20methods%20for%20gene%20expression%20data&rft.jtitle=Bioinformatics&rft.au=Preli%C4%87,%20Amela&rft.date=2006-05-01&rft.volume=22&rft.issue=9&rft.spage=1122&rft.epage=1129&rft.pages=1122-1129&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btl060&rft_dat=%3Cproquest_cross%3E19993805%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198660463&rft_id=info:pmid/16500941&rfr_iscdi=true