An evolutionary clustering algorithm for gene expression microarray data analysis

Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in g...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on evolutionary computation 2006-06, Vol.10 (3), p.296-314
Hauptverfasser: Ma, P.C.H., Chan, K.C.C., Xin Yao, Chiu, D.K.Y.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 314
container_issue 3
container_start_page 296
container_title IEEE transactions on evolutionary computation
container_volume 10
creator Ma, P.C.H.
Chan, K.C.C.
Xin Yao
Chiu, D.K.Y.
description Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.
doi_str_mv 10.1109/TEVC.2005.859371
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_896188803</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1637689</ieee_id><sourcerecordid>2342479111</sourcerecordid><originalsourceid>FETCH-LOGICAL-c425t-8a7fbabe0a54eba8302e8c69ab950cd8c409cb8afde9fa94a840e3424d117d3d3</originalsourceid><addsrcrecordid>eNp90U1LxDAQBuAiCurqXfASBPXUddKkaXKUxS8QRFDxVqbpdI102zVpxf33ZllB8OApgTzzwuRNkiMOU87BXDxdvcymGUA-1bkRBd9K9riRPAXI1Ha8gzZpUejX3WQ_hHcALnNu9pLHy47RZ9-Og-s79Ctm2zEM5F03Z9jOe--GtwVres_m1BGjr6WnEKJlC2d9j97jitU4IMMO21Vw4SDZabANdPhzTpLn66un2W16_3BzN7u8T63M8iHVWDQVVgSYS6pQC8hIW2WwMjnYWlsJxlYam5pMg0ailkBCZrLmvKhFLSbJ-SZ36fuPkcJQLlyw1LbYUT-GUhvFtdYgojz7V2YaFEihIjz5A9_70ce9YprKlRIiLyKCDYrrh-CpKZfeLeLXlRzKdRXluopyXUW5qSKOnP7kYrDYNh4768LvXKGl4TKL7njjHBH9PitRKG3ENzedk2I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>865663357</pqid></control><display><type>article</type><title>An evolutionary clustering algorithm for gene expression microarray data analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Ma, P.C.H. ; Chan, K.C.C. ; Xin Yao ; Chiu, D.K.Y.</creator><creatorcontrib>Ma, P.C.H. ; Chan, K.C.C. ; Xin Yao ; Chiu, D.K.Y.</creatorcontrib><description>Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.</description><identifier>ISSN: 1089-778X</identifier><identifier>EISSN: 1941-0026</identifier><identifier>DOI: 10.1109/TEVC.2005.859371</identifier><identifier>CODEN: ITEVF5</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Algorithms ; Applied sciences ; Artificial intelligence ; Binding sites ; Bioinformatics ; Biological cells ; Biological system modeling ; Chromosomes ; Clustering ; Clustering algorithms ; Clusters ; Computer science; control theory; systems ; Computer simulation ; Data analysis ; DNA sequence analysis ; Encoding ; Evolutionary ; Evolutionary algorithms ; evolutionary algorithms (EAs) ; Evolutionary computation ; Exact sciences and technology ; Gene expression ; gene expression microarray data analysis ; Learning and adaptive systems ; Studies ; Testing</subject><ispartof>IEEE transactions on evolutionary computation, 2006-06, Vol.10 (3), p.296-314</ispartof><rights>2006 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c425t-8a7fbabe0a54eba8302e8c69ab950cd8c409cb8afde9fa94a840e3424d117d3d3</citedby><cites>FETCH-LOGICAL-c425t-8a7fbabe0a54eba8302e8c69ab950cd8c409cb8afde9fa94a840e3424d117d3d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1637689$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1637689$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=17849142$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, P.C.H.</creatorcontrib><creatorcontrib>Chan, K.C.C.</creatorcontrib><creatorcontrib>Xin Yao</creatorcontrib><creatorcontrib>Chiu, D.K.Y.</creatorcontrib><title>An evolutionary clustering algorithm for gene expression microarray data analysis</title><title>IEEE transactions on evolutionary computation</title><addtitle>TEVC</addtitle><description>Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Binding sites</subject><subject>Bioinformatics</subject><subject>Biological cells</subject><subject>Biological system modeling</subject><subject>Chromosomes</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>Clusters</subject><subject>Computer science; control theory; systems</subject><subject>Computer simulation</subject><subject>Data analysis</subject><subject>DNA sequence analysis</subject><subject>Encoding</subject><subject>Evolutionary</subject><subject>Evolutionary algorithms</subject><subject>evolutionary algorithms (EAs)</subject><subject>Evolutionary computation</subject><subject>Exact sciences and technology</subject><subject>Gene expression</subject><subject>gene expression microarray data analysis</subject><subject>Learning and adaptive systems</subject><subject>Studies</subject><subject>Testing</subject><issn>1089-778X</issn><issn>1941-0026</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNp90U1LxDAQBuAiCurqXfASBPXUddKkaXKUxS8QRFDxVqbpdI102zVpxf33ZllB8OApgTzzwuRNkiMOU87BXDxdvcymGUA-1bkRBd9K9riRPAXI1Ha8gzZpUejX3WQ_hHcALnNu9pLHy47RZ9-Og-s79Ctm2zEM5F03Z9jOe--GtwVres_m1BGjr6WnEKJlC2d9j97jitU4IMMO21Vw4SDZabANdPhzTpLn66un2W16_3BzN7u8T63M8iHVWDQVVgSYS6pQC8hIW2WwMjnYWlsJxlYam5pMg0ailkBCZrLmvKhFLSbJ-SZ36fuPkcJQLlyw1LbYUT-GUhvFtdYgojz7V2YaFEihIjz5A9_70ce9YprKlRIiLyKCDYrrh-CpKZfeLeLXlRzKdRXluopyXUW5qSKOnP7kYrDYNh4768LvXKGl4TKL7njjHBH9PitRKG3ENzedk2I</recordid><startdate>20060601</startdate><enddate>20060601</enddate><creator>Ma, P.C.H.</creator><creator>Chan, K.C.C.</creator><creator>Xin Yao</creator><creator>Chiu, D.K.Y.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20060601</creationdate><title>An evolutionary clustering algorithm for gene expression microarray data analysis</title><author>Ma, P.C.H. ; Chan, K.C.C. ; Xin Yao ; Chiu, D.K.Y.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c425t-8a7fbabe0a54eba8302e8c69ab950cd8c409cb8afde9fa94a840e3424d117d3d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Binding sites</topic><topic>Bioinformatics</topic><topic>Biological cells</topic><topic>Biological system modeling</topic><topic>Chromosomes</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>Clusters</topic><topic>Computer science; control theory; systems</topic><topic>Computer simulation</topic><topic>Data analysis</topic><topic>DNA sequence analysis</topic><topic>Encoding</topic><topic>Evolutionary</topic><topic>Evolutionary algorithms</topic><topic>evolutionary algorithms (EAs)</topic><topic>Evolutionary computation</topic><topic>Exact sciences and technology</topic><topic>Gene expression</topic><topic>gene expression microarray data analysis</topic><topic>Learning and adaptive systems</topic><topic>Studies</topic><topic>Testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, P.C.H.</creatorcontrib><creatorcontrib>Chan, K.C.C.</creatorcontrib><creatorcontrib>Xin Yao</creatorcontrib><creatorcontrib>Chiu, D.K.Y.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on evolutionary computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ma, P.C.H.</au><au>Chan, K.C.C.</au><au>Xin Yao</au><au>Chiu, D.K.Y.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An evolutionary clustering algorithm for gene expression microarray data analysis</atitle><jtitle>IEEE transactions on evolutionary computation</jtitle><stitle>TEVC</stitle><date>2006-06-01</date><risdate>2006</risdate><volume>10</volume><issue>3</issue><spage>296</spage><epage>314</epage><pages>296-314</pages><issn>1089-778X</issn><eissn>1941-0026</eissn><coden>ITEVF5</coden><abstract>Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TEVC.2005.859371</doi><tpages>19</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1089-778X
ispartof IEEE transactions on evolutionary computation, 2006-06, Vol.10 (3), p.296-314
issn 1089-778X
1941-0026
language eng
recordid cdi_proquest_miscellaneous_896188803
source IEEE Electronic Library (IEL)
subjects Algorithms
Applied sciences
Artificial intelligence
Binding sites
Bioinformatics
Biological cells
Biological system modeling
Chromosomes
Clustering
Clustering algorithms
Clusters
Computer science
control theory
systems
Computer simulation
Data analysis
DNA sequence analysis
Encoding
Evolutionary
Evolutionary algorithms
evolutionary algorithms (EAs)
Evolutionary computation
Exact sciences and technology
Gene expression
gene expression microarray data analysis
Learning and adaptive systems
Studies
Testing
title An evolutionary clustering algorithm for gene expression microarray data analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T14%3A56%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20evolutionary%20clustering%20algorithm%20for%20gene%20expression%20microarray%20data%20analysis&rft.jtitle=IEEE%20transactions%20on%20evolutionary%20computation&rft.au=Ma,%20P.C.H.&rft.date=2006-06-01&rft.volume=10&rft.issue=3&rft.spage=296&rft.epage=314&rft.pages=296-314&rft.issn=1089-778X&rft.eissn=1941-0026&rft.coden=ITEVF5&rft_id=info:doi/10.1109/TEVC.2005.859371&rft_dat=%3Cproquest_RIE%3E2342479111%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=865663357&rft_id=info:pmid/&rft_ieee_id=1637689&rfr_iscdi=true