Evaluation and comparison of gene clustering methods in microarray analysis

Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2006-10, Vol.22 (19), p.2405-2412
Hauptverfasser: Thalamuthu, Anbupalam, Mukhopadhyay, Indranil, Zheng, Xiaojing, Tseng, George C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2412
container_issue 19
container_start_page 2405
container_title Bioinformatics
container_volume 22
creator Thalamuthu, Anbupalam
Mukhopadhyay, Indranil
Zheng, Xiaojing
Tseng, George C.
description Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btl406
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68950478</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1144405741</sourcerecordid><originalsourceid>FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</originalsourceid><addsrcrecordid>eNqFkU9v1DAQxS0EoqXwEUAREtzS-r-dI6oKi1oKB0BoL9bEcYpLEm89CWK_PYZdUbUXTuORf-_pjR4hzxk9ZrQRJ21McepTHmGOHk_aeZBUPyCHTGpac6qah-UttKmlpeKAPEG8plQxKeVjcsC0tVwrcUjOz37CsBSPNFUwdZVP4wZyxLKmvroKU6j8sOAccpyuqjHM31OHVZyqMfqcIGfYFh0MW4z4lDzqYcDwbD-PyJe3Z59PV_XFx3fvT99c1F5JPdfeSAHeayak6A0VYFkIVhghqG65agO0JV6gXNnQUWY8bzsrLVANjAfdiyPyeue7yelmCTi7MaIPwwBTSAs6bRtFpbH_BTnlzCjeFPDlPfA6LbmchY411pRsfyG1g8rhiDn0bpPjCHnrGHV_OnF3O3G7Toruxd58acfQ3ar2JRTg1R4A9DD0GSYf8ZaznCvdmMLVOy6WPn79-4f8w-mSUbnVt7X79HV9uWaXK_dB_AagKKml</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198737329</pqid></control><display><type>article</type><title>Evaluation and comparison of gene clustering methods in microarray analysis</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>Thalamuthu, Anbupalam ; Mukhopadhyay, Indranil ; Zheng, Xiaojing ; Tseng, George C.</creator><creatorcontrib>Thalamuthu, Anbupalam ; Mukhopadhyay, Indranil ; Zheng, Xiaojing ; Tseng, George C.</creatorcontrib><description>Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btl406</identifier><identifier>PMID: 16882653</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Artificial Intelligence ; Biological and medical sciences ; Cluster Analysis ; Computer Simulation ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Models, Genetic ; Models, Statistical ; Multigene Family ; Oligonucleotide Array Sequence Analysis - methods ; Reproducibility of Results ; Sensitivity and Specificity ; Software ; Software Validation</subject><ispartof>Bioinformatics, 2006-10, Vol.22 (19), p.2405-2412</ispartof><rights>2007 INIST-CNRS</rights><rights>Copyright Oxford University Press(England) Oct 1, 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</citedby><cites>FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=18225697$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16882653$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Thalamuthu, Anbupalam</creatorcontrib><creatorcontrib>Mukhopadhyay, Indranil</creatorcontrib><creatorcontrib>Zheng, Xiaojing</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><title>Evaluation and comparison of gene clustering methods in microarray analysis</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Computer Simulation</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Models, Genetic</subject><subject>Models, Statistical</subject><subject>Multigene Family</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Reproducibility of Results</subject><subject>Sensitivity and Specificity</subject><subject>Software</subject><subject>Software Validation</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU9v1DAQxS0EoqXwEUAREtzS-r-dI6oKi1oKB0BoL9bEcYpLEm89CWK_PYZdUbUXTuORf-_pjR4hzxk9ZrQRJ21McepTHmGOHk_aeZBUPyCHTGpac6qah-UttKmlpeKAPEG8plQxKeVjcsC0tVwrcUjOz37CsBSPNFUwdZVP4wZyxLKmvroKU6j8sOAccpyuqjHM31OHVZyqMfqcIGfYFh0MW4z4lDzqYcDwbD-PyJe3Z59PV_XFx3fvT99c1F5JPdfeSAHeayak6A0VYFkIVhghqG65agO0JV6gXNnQUWY8bzsrLVANjAfdiyPyeue7yelmCTi7MaIPwwBTSAs6bRtFpbH_BTnlzCjeFPDlPfA6LbmchY411pRsfyG1g8rhiDn0bpPjCHnrGHV_OnF3O3G7Toruxd58acfQ3ar2JRTg1R4A9DD0GSYf8ZaznCvdmMLVOy6WPn79-4f8w-mSUbnVt7X79HV9uWaXK_dB_AagKKml</recordid><startdate>20061001</startdate><enddate>20061001</enddate><creator>Thalamuthu, Anbupalam</creator><creator>Mukhopadhyay, Indranil</creator><creator>Zheng, Xiaojing</creator><creator>Tseng, George C.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20061001</creationdate><title>Evaluation and comparison of gene clustering methods in microarray analysis</title><author>Thalamuthu, Anbupalam ; Mukhopadhyay, Indranil ; Zheng, Xiaojing ; Tseng, George C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Computer Simulation</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Models, Genetic</topic><topic>Models, Statistical</topic><topic>Multigene Family</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Reproducibility of Results</topic><topic>Sensitivity and Specificity</topic><topic>Software</topic><topic>Software Validation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thalamuthu, Anbupalam</creatorcontrib><creatorcontrib>Mukhopadhyay, Indranil</creatorcontrib><creatorcontrib>Zheng, Xiaojing</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thalamuthu, Anbupalam</au><au>Mukhopadhyay, Indranil</au><au>Zheng, Xiaojing</au><au>Tseng, George C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluation and comparison of gene clustering methods in microarray analysis</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2006-10-01</date><risdate>2006</risdate><volume>22</volume><issue>19</issue><spage>2405</spage><epage>2412</epage><pages>2405-2412</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>16882653</pmid><doi>10.1093/bioinformatics/btl406</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2006-10, Vol.22 (19), p.2405-2412
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_68950478
source MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection
subjects Algorithms
Artificial Intelligence
Biological and medical sciences
Cluster Analysis
Computer Simulation
Fundamental and applied biological sciences. Psychology
Gene Expression Profiling - methods
General aspects
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Models, Genetic
Models, Statistical
Multigene Family
Oligonucleotide Array Sequence Analysis - methods
Reproducibility of Results
Sensitivity and Specificity
Software
Software Validation
title Evaluation and comparison of gene clustering methods in microarray analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A00%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluation%20and%20comparison%20of%20gene%20clustering%20methods%20in%20microarray%20analysis&rft.jtitle=Bioinformatics&rft.au=Thalamuthu,%20Anbupalam&rft.date=2006-10-01&rft.volume=22&rft.issue=19&rft.spage=2405&rft.epage=2412&rft.pages=2405-2412&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btl406&rft_dat=%3Cproquest_cross%3E1144405741%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198737329&rft_id=info:pmid/16882653&rfr_iscdi=true