Fuzzy c-means clustering with prior biological knowledge

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology anno...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2009-02, Vol.42 (1), p.74-81
Hauptverfasser: Tari, Luis, Baral, Chitta, Kim, Seungchan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 81
container_issue 1
container_start_page 74
container_title Journal of biomedical informatics
container_volume 42
creator Tari, Luis
Baral, Chitta
Kim, Seungchan
description We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast ( Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.
doi_str_mv 10.1016/j.jbi.2008.05.009
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2673503</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1532046408000798</els_id><sourcerecordid>66950562</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-507d20314781151e7aba8bc255bb0855788a2dbf60b0506d3b878851c1efe3473</originalsourceid><addsrcrecordid>eNqFkU1LxDAQhoMofqz-AC_Sk7fWSbuTZhEEEb9A8KLnkKSza9Zuo0mr6K83sosfFz0lJO-88848jO1zKDhwcTQv5sYVJYAsAAuAyRrb5liVOYwlrH_dxXiL7cQ4B-AcUWyyLS5xgnU92WbyYnh_f8tsviDdxcy2Q-wpuG6Wvbr-IXsKzofMON_6mbO6zR47_9pSM6NdtjHVbaS91Tli9xfnd2dX-c3t5fXZ6U1uU4Y-R6ibEio-rmVqzqnWRktjS0RjQCLWUuqyMVMBBhBEUxmZnpBbTlOqxnU1YidL36fBLKix1PVBtyoFW-jwprx26vdP5x7UzL-oUtQVQpUMDlcGwT8PFHu1cNFS2-qO_BCVEBMEFOW_wjQGVpB2OmJ8KbTBxxho-pWGg_oEo-YqgVGfYBSgSmBSzcHPMb4rViSS4HgpoLTMF0dBReuos9S4QLZXjXd_2H8AKO2ePw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>20353015</pqid></control><display><type>article</type><title>Fuzzy c-means clustering with prior biological knowledge</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Tari, Luis ; Baral, Chitta ; Kim, Seungchan</creator><creatorcontrib>Tari, Luis ; Baral, Chitta ; Kim, Seungchan</creatorcontrib><description>We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast ( Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.</description><identifier>ISSN: 1532-0464</identifier><identifier>EISSN: 1532-0480</identifier><identifier>DOI: 10.1016/j.jbi.2008.05.009</identifier><identifier>PMID: 18595779</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Algorithms ; Cluster Analysis ; Computational Biology ; Databases, Genetic ; Fuzzy c-means clustering ; Fuzzy Logic ; Gene expression data ; Gene Expression Profiling - methods ; Gene function prediction ; Gene Ontology ; Genes - physiology ; Genes, Fungal - physiology ; Internet ; Normal Distribution ; Oligonucleotide Array Sequence Analysis ; Reproducibility of Results ; Saccharomyces cerevisiae ; Saccharomyces cerevisiae - genetics ; Saccharomyces cerevisiae yeast ; Semi-supervised clustering ; Software</subject><ispartof>Journal of biomedical informatics, 2009-02, Vol.42 (1), p.74-81</ispartof><rights>2008 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-507d20314781151e7aba8bc255bb0855788a2dbf60b0506d3b878851c1efe3473</citedby><cites>FETCH-LOGICAL-c480t-507d20314781151e7aba8bc255bb0855788a2dbf60b0506d3b878851c1efe3473</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1532046408000798$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>230,314,776,780,881,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18595779$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tari, Luis</creatorcontrib><creatorcontrib>Baral, Chitta</creatorcontrib><creatorcontrib>Kim, Seungchan</creatorcontrib><title>Fuzzy c-means clustering with prior biological knowledge</title><title>Journal of biomedical informatics</title><addtitle>J Biomed Inform</addtitle><description>We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast ( Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.</description><subject>Algorithms</subject><subject>Cluster Analysis</subject><subject>Computational Biology</subject><subject>Databases, Genetic</subject><subject>Fuzzy c-means clustering</subject><subject>Fuzzy Logic</subject><subject>Gene expression data</subject><subject>Gene Expression Profiling - methods</subject><subject>Gene function prediction</subject><subject>Gene Ontology</subject><subject>Genes - physiology</subject><subject>Genes, Fungal - physiology</subject><subject>Internet</subject><subject>Normal Distribution</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Reproducibility of Results</subject><subject>Saccharomyces cerevisiae</subject><subject>Saccharomyces cerevisiae - genetics</subject><subject>Saccharomyces cerevisiae yeast</subject><subject>Semi-supervised clustering</subject><subject>Software</subject><issn>1532-0464</issn><issn>1532-0480</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU1LxDAQhoMofqz-AC_Sk7fWSbuTZhEEEb9A8KLnkKSza9Zuo0mr6K83sosfFz0lJO-88848jO1zKDhwcTQv5sYVJYAsAAuAyRrb5liVOYwlrH_dxXiL7cQ4B-AcUWyyLS5xgnU92WbyYnh_f8tsviDdxcy2Q-wpuG6Wvbr-IXsKzofMON_6mbO6zR47_9pSM6NdtjHVbaS91Tli9xfnd2dX-c3t5fXZ6U1uU4Y-R6ibEio-rmVqzqnWRktjS0RjQCLWUuqyMVMBBhBEUxmZnpBbTlOqxnU1YidL36fBLKix1PVBtyoFW-jwprx26vdP5x7UzL-oUtQVQpUMDlcGwT8PFHu1cNFS2-qO_BCVEBMEFOW_wjQGVpB2OmJ8KbTBxxho-pWGg_oEo-YqgVGfYBSgSmBSzcHPMb4rViSS4HgpoLTMF0dBReuos9S4QLZXjXd_2H8AKO2ePw</recordid><startdate>20090201</startdate><enddate>20090201</enddate><creator>Tari, Luis</creator><creator>Baral, Chitta</creator><creator>Kim, Seungchan</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>M7N</scope><scope>P64</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20090201</creationdate><title>Fuzzy c-means clustering with prior biological knowledge</title><author>Tari, Luis ; Baral, Chitta ; Kim, Seungchan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-507d20314781151e7aba8bc255bb0855788a2dbf60b0506d3b878851c1efe3473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Algorithms</topic><topic>Cluster Analysis</topic><topic>Computational Biology</topic><topic>Databases, Genetic</topic><topic>Fuzzy c-means clustering</topic><topic>Fuzzy Logic</topic><topic>Gene expression data</topic><topic>Gene Expression Profiling - methods</topic><topic>Gene function prediction</topic><topic>Gene Ontology</topic><topic>Genes - physiology</topic><topic>Genes, Fungal - physiology</topic><topic>Internet</topic><topic>Normal Distribution</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Reproducibility of Results</topic><topic>Saccharomyces cerevisiae</topic><topic>Saccharomyces cerevisiae - genetics</topic><topic>Saccharomyces cerevisiae yeast</topic><topic>Semi-supervised clustering</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tari, Luis</creatorcontrib><creatorcontrib>Baral, Chitta</creatorcontrib><creatorcontrib>Kim, Seungchan</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of biomedical informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tari, Luis</au><au>Baral, Chitta</au><au>Kim, Seungchan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fuzzy c-means clustering with prior biological knowledge</atitle><jtitle>Journal of biomedical informatics</jtitle><addtitle>J Biomed Inform</addtitle><date>2009-02-01</date><risdate>2009</risdate><volume>42</volume><issue>1</issue><spage>74</spage><epage>81</epage><pages>74-81</pages><issn>1532-0464</issn><eissn>1532-0480</eissn><abstract>We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast ( Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>18595779</pmid><doi>10.1016/j.jbi.2008.05.009</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1532-0464
ispartof Journal of biomedical informatics, 2009-02, Vol.42 (1), p.74-81
issn 1532-0464
1532-0480
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2673503
source MEDLINE; Elsevier ScienceDirect Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Cluster Analysis
Computational Biology
Databases, Genetic
Fuzzy c-means clustering
Fuzzy Logic
Gene expression data
Gene Expression Profiling - methods
Gene function prediction
Gene Ontology
Genes - physiology
Genes, Fungal - physiology
Internet
Normal Distribution
Oligonucleotide Array Sequence Analysis
Reproducibility of Results
Saccharomyces cerevisiae
Saccharomyces cerevisiae - genetics
Saccharomyces cerevisiae yeast
Semi-supervised clustering
Software
title Fuzzy c-means clustering with prior biological knowledge
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T00%3A11%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fuzzy%20c-means%20clustering%20with%20prior%20biological%20knowledge&rft.jtitle=Journal%20of%20biomedical%20informatics&rft.au=Tari,%20Luis&rft.date=2009-02-01&rft.volume=42&rft.issue=1&rft.spage=74&rft.epage=81&rft.pages=74-81&rft.issn=1532-0464&rft.eissn=1532-0480&rft_id=info:doi/10.1016/j.jbi.2008.05.009&rft_dat=%3Cproquest_pubme%3E66950562%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=20353015&rft_id=info:pmid/18595779&rft_els_id=S1532046408000798&rfr_iscdi=true