Evaluation and comparison of gene clustering methods in microarray analysis
Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease...
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2006-10, Vol.22 (19), p.2405-2412 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2412 |
---|---|
container_issue | 19 |
container_start_page | 2405 |
container_title | Bioinformatics |
container_volume | 22 |
creator | Thalamuthu, Anbupalam Mukhopadhyay, Indranil Zheng, Xiaojing Tseng, George C. |
description | Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
doi_str_mv | 10.1093/bioinformatics/btl406 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68950478</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1144405741</sourcerecordid><originalsourceid>FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</originalsourceid><addsrcrecordid>eNqFkU9v1DAQxS0EoqXwEUAREtzS-r-dI6oKi1oKB0BoL9bEcYpLEm89CWK_PYZdUbUXTuORf-_pjR4hzxk9ZrQRJ21McepTHmGOHk_aeZBUPyCHTGpac6qah-UttKmlpeKAPEG8plQxKeVjcsC0tVwrcUjOz37CsBSPNFUwdZVP4wZyxLKmvroKU6j8sOAccpyuqjHM31OHVZyqMfqcIGfYFh0MW4z4lDzqYcDwbD-PyJe3Z59PV_XFx3fvT99c1F5JPdfeSAHeayak6A0VYFkIVhghqG65agO0JV6gXNnQUWY8bzsrLVANjAfdiyPyeue7yelmCTi7MaIPwwBTSAs6bRtFpbH_BTnlzCjeFPDlPfA6LbmchY411pRsfyG1g8rhiDn0bpPjCHnrGHV_OnF3O3G7Toruxd58acfQ3ar2JRTg1R4A9DD0GSYf8ZaznCvdmMLVOy6WPn79-4f8w-mSUbnVt7X79HV9uWaXK_dB_AagKKml</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198737329</pqid></control><display><type>article</type><title>Evaluation and comparison of gene clustering methods in microarray analysis</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>Thalamuthu, Anbupalam ; Mukhopadhyay, Indranil ; Zheng, Xiaojing ; Tseng, George C.</creator><creatorcontrib>Thalamuthu, Anbupalam ; Mukhopadhyay, Indranil ; Zheng, Xiaojing ; Tseng, George C.</creatorcontrib><description>Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btl406</identifier><identifier>PMID: 16882653</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Artificial Intelligence ; Biological and medical sciences ; Cluster Analysis ; Computer Simulation ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Models, Genetic ; Models, Statistical ; Multigene Family ; Oligonucleotide Array Sequence Analysis - methods ; Reproducibility of Results ; Sensitivity and Specificity ; Software ; Software Validation</subject><ispartof>Bioinformatics, 2006-10, Vol.22 (19), p.2405-2412</ispartof><rights>2007 INIST-CNRS</rights><rights>Copyright Oxford University Press(England) Oct 1, 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</citedby><cites>FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18225697$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16882653$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Thalamuthu, Anbupalam</creatorcontrib><creatorcontrib>Mukhopadhyay, Indranil</creatorcontrib><creatorcontrib>Zheng, Xiaojing</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><title>Evaluation and comparison of gene clustering methods in microarray analysis</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Computer Simulation</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Models, Genetic</subject><subject>Models, Statistical</subject><subject>Multigene Family</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Reproducibility of Results</subject><subject>Sensitivity and Specificity</subject><subject>Software</subject><subject>Software Validation</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU9v1DAQxS0EoqXwEUAREtzS-r-dI6oKi1oKB0BoL9bEcYpLEm89CWK_PYZdUbUXTuORf-_pjR4hzxk9ZrQRJ21McepTHmGOHk_aeZBUPyCHTGpac6qah-UttKmlpeKAPEG8plQxKeVjcsC0tVwrcUjOz37CsBSPNFUwdZVP4wZyxLKmvroKU6j8sOAccpyuqjHM31OHVZyqMfqcIGfYFh0MW4z4lDzqYcDwbD-PyJe3Z59PV_XFx3fvT99c1F5JPdfeSAHeayak6A0VYFkIVhghqG65agO0JV6gXNnQUWY8bzsrLVANjAfdiyPyeue7yelmCTi7MaIPwwBTSAs6bRtFpbH_BTnlzCjeFPDlPfA6LbmchY411pRsfyG1g8rhiDn0bpPjCHnrGHV_OnF3O3G7Toruxd58acfQ3ar2JRTg1R4A9DD0GSYf8ZaznCvdmMLVOy6WPn79-4f8w-mSUbnVt7X79HV9uWaXK_dB_AagKKml</recordid><startdate>20061001</startdate><enddate>20061001</enddate><creator>Thalamuthu, Anbupalam</creator><creator>Mukhopadhyay, Indranil</creator><creator>Zheng, Xiaojing</creator><creator>Tseng, George C.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20061001</creationdate><title>Evaluation and comparison of gene clustering methods in microarray analysis</title><author>Thalamuthu, Anbupalam ; Mukhopadhyay, Indranil ; Zheng, Xiaojing ; Tseng, George C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c546t-c743acc61343f703a81ee8373306b25beab688e0258ed017c2bd848a06a12e6f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Computer Simulation</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Models, Genetic</topic><topic>Models, Statistical</topic><topic>Multigene Family</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Reproducibility of Results</topic><topic>Sensitivity and Specificity</topic><topic>Software</topic><topic>Software Validation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thalamuthu, Anbupalam</creatorcontrib><creatorcontrib>Mukhopadhyay, Indranil</creatorcontrib><creatorcontrib>Zheng, Xiaojing</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thalamuthu, Anbupalam</au><au>Mukhopadhyay, Indranil</au><au>Zheng, Xiaojing</au><au>Tseng, George C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluation and comparison of gene clustering methods in microarray analysis</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2006-10-01</date><risdate>2006</risdate><volume>22</volume><issue>19</issue><spage>2405</spage><epage>2412</epage><pages>2405-2412</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. Results: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>16882653</pmid><doi>10.1093/bioinformatics/btl406</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2006-10, Vol.22 (19), p.2405-2412 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_proquest_miscellaneous_68950478 |
source | MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection |
subjects | Algorithms Artificial Intelligence Biological and medical sciences Cluster Analysis Computer Simulation Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Models, Genetic Models, Statistical Multigene Family Oligonucleotide Array Sequence Analysis - methods Reproducibility of Results Sensitivity and Specificity Software Software Validation |
title | Evaluation and comparison of gene clustering methods in microarray analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T21%3A00%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluation%20and%20comparison%20of%20gene%20clustering%20methods%20in%20microarray%20analysis&rft.jtitle=Bioinformatics&rft.au=Thalamuthu,%20Anbupalam&rft.date=2006-10-01&rft.volume=22&rft.issue=19&rft.spage=2405&rft.epage=2412&rft.pages=2405-2412&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btl406&rft_dat=%3Cproquest_cross%3E1144405741%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198737329&rft_id=info:pmid/16882653&rfr_iscdi=true |