A data-driven clustering method for time course gene expression data

Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that shar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2006-01, Vol.34 (4), p.1261-1269
Hauptverfasser: Ma, Ping, Castillo-Davis, Cristian I., Zhong, Wenxuan, Liu, Jun S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1269
container_issue 4
container_start_page 1261
container_title Nucleic acids research
container_volume 34
creator Ma, Ping
Castillo-Davis, Cristian I.
Zhong, Wenxuan
Liu, Jun S.
description Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).
doi_str_mv 10.1093/nar/gkl013
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1388097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>67707531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</originalsourceid><addsrcrecordid>eNpdkU1PFTEUhhsjkQu68QeYiQsWJAM9_ZrOxoSgl0skccPCuGlK58ylMNNe2xmC_97ivUFl1TTn6dv35CHkPdAToC0_DTadru8HCvwVWQBXrBatYq_JgnIqa6BC75ODnO8oBQFSvCH7oCRQLdmCfD6rOjvZukv-AUPlhjlPmHxYVyNOt7Gr-piqyY9YuTinjNUaA1b4uEmYs4_hz-u3ZK-3Q8Z3u_OQXC-_XJ-v6qtvF5fnZ1e1EwKmuvzpmMCedUwAY05xphwy6rC1Smvdyt51ylrb9iDLRd_orqesAyGsti0_JJ-2sZv5ZsTOYZiSHcwm-dGmXyZab_6fBH9r1vHBANeatk0JONoFpPhzxjyZ0WeHw2ADxjkb1TS0kRwK-PEFeFe2D2U3wyhVoLV4SjveQi7FnBP2z02AmicxpogxWzEF_vBv97_ozkQB6i3gi4DH57lN96UWb6RZff9h5MVSfRViaVb8N_sTmZ4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200618847</pqid></control><display><type>article</type><title>A data-driven clustering method for time course gene expression data</title><source>Open Access: PubMed Central</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Oxford Journals Open Access Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</creator><creatorcontrib>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</creatorcontrib><description>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</description><identifier>ISSN: 0305-1048</identifier><identifier>ISSN: 1362-4962</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkl013</identifier><identifier>PMID: 16510852</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Animals ; Caenorhabditis elegans - embryology ; Caenorhabditis elegans - genetics ; Caenorhabditis elegans - growth &amp; development ; Cluster Analysis ; Computer Simulation ; Drosophila melanogaster - embryology ; Drosophila melanogaster - genetics ; Drosophila melanogaster - growth &amp; development ; Gene Expression ; Gene Expression Profiling - methods ; Internet ; Kinetics ; Models, Statistical ; Oligonucleotide Array Sequence Analysis - methods ; Software</subject><ispartof>Nucleic acids research, 2006-01, Vol.34 (4), p.1261-1269</ispartof><rights>Copyright Oxford University Press(England) 2006</rights><rights>The Author 2006. Published by Oxford University Press. All rights reserved 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1388097/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1388097/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16510852$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Ping</creatorcontrib><creatorcontrib>Castillo-Davis, Cristian I.</creatorcontrib><creatorcontrib>Zhong, Wenxuan</creatorcontrib><creatorcontrib>Liu, Jun S.</creatorcontrib><title>A data-driven clustering method for time course gene expression data</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</description><subject>Algorithms</subject><subject>Animals</subject><subject>Caenorhabditis elegans - embryology</subject><subject>Caenorhabditis elegans - genetics</subject><subject>Caenorhabditis elegans - growth &amp; development</subject><subject>Cluster Analysis</subject><subject>Computer Simulation</subject><subject>Drosophila melanogaster - embryology</subject><subject>Drosophila melanogaster - genetics</subject><subject>Drosophila melanogaster - growth &amp; development</subject><subject>Gene Expression</subject><subject>Gene Expression Profiling - methods</subject><subject>Internet</subject><subject>Kinetics</subject><subject>Models, Statistical</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Software</subject><issn>0305-1048</issn><issn>1362-4962</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkU1PFTEUhhsjkQu68QeYiQsWJAM9_ZrOxoSgl0skccPCuGlK58ylMNNe2xmC_97ivUFl1TTn6dv35CHkPdAToC0_DTadru8HCvwVWQBXrBatYq_JgnIqa6BC75ODnO8oBQFSvCH7oCRQLdmCfD6rOjvZukv-AUPlhjlPmHxYVyNOt7Gr-piqyY9YuTinjNUaA1b4uEmYs4_hz-u3ZK-3Q8Z3u_OQXC-_XJ-v6qtvF5fnZ1e1EwKmuvzpmMCedUwAY05xphwy6rC1Smvdyt51ylrb9iDLRd_orqesAyGsti0_JJ-2sZv5ZsTOYZiSHcwm-dGmXyZab_6fBH9r1vHBANeatk0JONoFpPhzxjyZ0WeHw2ADxjkb1TS0kRwK-PEFeFe2D2U3wyhVoLV4SjveQi7FnBP2z02AmicxpogxWzEF_vBv97_ozkQB6i3gi4DH57lN96UWb6RZff9h5MVSfRViaVb8N_sTmZ4</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Ma, Ping</creator><creator>Castillo-Davis, Cristian I.</creator><creator>Zhong, Wenxuan</creator><creator>Liu, Jun S.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20060101</creationdate><title>A data-driven clustering method for time course gene expression data</title><author>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Caenorhabditis elegans - embryology</topic><topic>Caenorhabditis elegans - genetics</topic><topic>Caenorhabditis elegans - growth &amp; development</topic><topic>Cluster Analysis</topic><topic>Computer Simulation</topic><topic>Drosophila melanogaster - embryology</topic><topic>Drosophila melanogaster - genetics</topic><topic>Drosophila melanogaster - growth &amp; development</topic><topic>Gene Expression</topic><topic>Gene Expression Profiling - methods</topic><topic>Internet</topic><topic>Kinetics</topic><topic>Models, Statistical</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, Ping</creatorcontrib><creatorcontrib>Castillo-Davis, Cristian I.</creatorcontrib><creatorcontrib>Zhong, Wenxuan</creatorcontrib><creatorcontrib>Liu, Jun S.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Ping</au><au>Castillo-Davis, Cristian I.</au><au>Zhong, Wenxuan</au><au>Liu, Jun S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A data-driven clustering method for time course gene expression data</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2006-01-01</date><risdate>2006</risdate><volume>34</volume><issue>4</issue><spage>1261</spage><epage>1269</epage><pages>1261-1269</pages><issn>0305-1048</issn><issn>1362-4962</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>16510852</pmid><doi>10.1093/nar/gkl013</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2006-01, Vol.34 (4), p.1261-1269
issn 0305-1048
1362-4962
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1388097
source Open Access: PubMed Central; MEDLINE; DOAJ Directory of Open Access Journals; Oxford Journals Open Access Collection; Free Full-Text Journals in Chemistry
subjects Algorithms
Animals
Caenorhabditis elegans - embryology
Caenorhabditis elegans - genetics
Caenorhabditis elegans - growth & development
Cluster Analysis
Computer Simulation
Drosophila melanogaster - embryology
Drosophila melanogaster - genetics
Drosophila melanogaster - growth & development
Gene Expression
Gene Expression Profiling - methods
Internet
Kinetics
Models, Statistical
Oligonucleotide Array Sequence Analysis - methods
Software
title A data-driven clustering method for time course gene expression data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T15%3A14%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20data-driven%20clustering%20method%20for%20time%20course%20gene%20expression%20data&rft.jtitle=Nucleic%20acids%20research&rft.au=Ma,%20Ping&rft.date=2006-01-01&rft.volume=34&rft.issue=4&rft.spage=1261&rft.epage=1269&rft.pages=1261-1269&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gkl013&rft_dat=%3Cproquest_pubme%3E67707531%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200618847&rft_id=info:pmid/16510852&rfr_iscdi=true