A data-driven clustering method for time course gene expression data

Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that shar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nucleic acids research 2006-01, Vol.34 (4), p.1261-1269
Hauptverfasser:	Ma, Ping, Castillo-Davis, Cristian I., Zhong, Wenxuan, Liu, Jun S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Animals Caenorhabditis elegans - embryology Caenorhabditis elegans - genetics Caenorhabditis elegans - growth & development Cluster Analysis Computer Simulation Drosophila melanogaster - embryology Drosophila melanogaster - genetics Drosophila melanogaster - growth & development Gene Expression Gene Expression Profiling - methods Internet Kinetics Models, Statistical Oligonucleotide Array Sequence Analysis - methods Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1269
container_issue	4
container_start_page	1261
container_title	Nucleic acids research
container_volume	34
creator	Ma, Ping Castillo-Davis, Cristian I. Zhong, Wenxuan Liu, Jun S.
description	Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).
doi_str_mv	10.1093/nar/gkl013
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1388097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>67707531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</originalsourceid><addsrcrecordid>eNpdkU1PFTEUhhsjkQu68QeYiQsWJAM9_ZrOxoSgl0skccPCuGlK58ylMNNe2xmC_97ivUFl1TTn6dv35CHkPdAToC0_DTadru8HCvwVWQBXrBatYq_JgnIqa6BC75ODnO8oBQFSvCH7oCRQLdmCfD6rOjvZukv-AUPlhjlPmHxYVyNOt7Gr-piqyY9YuTinjNUaA1b4uEmYs4_hz-u3ZK-3Q8Z3u_OQXC-_XJ-v6qtvF5fnZ1e1EwKmuvzpmMCedUwAY05xphwy6rC1Smvdyt51ylrb9iDLRd_orqesAyGsti0_JJ-2sZv5ZsTOYZiSHcwm-dGmXyZab_6fBH9r1vHBANeatk0JONoFpPhzxjyZ0WeHw2ADxjkb1TS0kRwK-PEFeFe2D2U3wyhVoLV4SjveQi7FnBP2z02AmicxpogxWzEF_vBv97_ozkQB6i3gi4DH57lN96UWb6RZff9h5MVSfRViaVb8N_sTmZ4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200618847</pqid></control><display><type>article</type><title>A data-driven clustering method for time course gene expression data</title><source>Open Access: PubMed Central</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Oxford Journals Open Access Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</creator><creatorcontrib>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</creatorcontrib><description>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</description><identifier>ISSN: 0305-1048</identifier><identifier>ISSN: 1362-4962</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkl013</identifier><identifier>PMID: 16510852</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Animals ; Caenorhabditis elegans - embryology ; Caenorhabditis elegans - genetics ; Caenorhabditis elegans - growth & development ; Cluster Analysis ; Computer Simulation ; Drosophila melanogaster - embryology ; Drosophila melanogaster - genetics ; Drosophila melanogaster - growth & development ; Gene Expression ; Gene Expression Profiling - methods ; Internet ; Kinetics ; Models, Statistical ; Oligonucleotide Array Sequence Analysis - methods ; Software</subject><ispartof>Nucleic acids research, 2006-01, Vol.34 (4), p.1261-1269</ispartof><rights>Copyright Oxford University Press(England) 2006</rights><rights>The Author 2006. Published by Oxford University Press. All rights reserved 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1388097/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1388097/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16510852$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Ping</creatorcontrib><creatorcontrib>Castillo-Davis, Cristian I.</creatorcontrib><creatorcontrib>Zhong, Wenxuan</creatorcontrib><creatorcontrib>Liu, Jun S.</creatorcontrib><title>A data-driven clustering method for time course gene expression data</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</description><subject>Algorithms</subject><subject>Animals</subject><subject>Caenorhabditis elegans - embryology</subject><subject>Caenorhabditis elegans - genetics</subject><subject>Caenorhabditis elegans - growth & development</subject><subject>Cluster Analysis</subject><subject>Computer Simulation</subject><subject>Drosophila melanogaster - embryology</subject><subject>Drosophila melanogaster - genetics</subject><subject>Drosophila melanogaster - growth & development</subject><subject>Gene Expression</subject><subject>Gene Expression Profiling - methods</subject><subject>Internet</subject><subject>Kinetics</subject><subject>Models, Statistical</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Software</subject><issn>0305-1048</issn><issn>1362-4962</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkU1PFTEUhhsjkQu68QeYiQsWJAM9_ZrOxoSgl0skccPCuGlK58ylMNNe2xmC_97ivUFl1TTn6dv35CHkPdAToC0_DTadru8HCvwVWQBXrBatYq_JgnIqa6BC75ODnO8oBQFSvCH7oCRQLdmCfD6rOjvZukv-AUPlhjlPmHxYVyNOt7Gr-piqyY9YuTinjNUaA1b4uEmYs4_hz-u3ZK-3Q8Z3u_OQXC-_XJ-v6qtvF5fnZ1e1EwKmuvzpmMCedUwAY05xphwy6rC1Smvdyt51ylrb9iDLRd_orqesAyGsti0_JJ-2sZv5ZsTOYZiSHcwm-dGmXyZab_6fBH9r1vHBANeatk0JONoFpPhzxjyZ0WeHw2ADxjkb1TS0kRwK-PEFeFe2D2U3wyhVoLV4SjveQi7FnBP2z02AmicxpogxWzEF_vBv97_ozkQB6i3gi4DH57lN96UWb6RZff9h5MVSfRViaVb8N_sTmZ4</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Ma, Ping</creator><creator>Castillo-Davis, Cristian I.</creator><creator>Zhong, Wenxuan</creator><creator>Liu, Jun S.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20060101</creationdate><title>A data-driven clustering method for time course gene expression data</title><author>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Caenorhabditis elegans - embryology</topic><topic>Caenorhabditis elegans - genetics</topic><topic>Caenorhabditis elegans - growth & development</topic><topic>Cluster Analysis</topic><topic>Computer Simulation</topic><topic>Drosophila melanogaster - embryology</topic><topic>Drosophila melanogaster - genetics</topic><topic>Drosophila melanogaster - growth & development</topic><topic>Gene Expression</topic><topic>Gene Expression Profiling - methods</topic><topic>Internet</topic><topic>Kinetics</topic><topic>Models, Statistical</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, Ping</creatorcontrib><creatorcontrib>Castillo-Davis, Cristian I.</creatorcontrib><creatorcontrib>Zhong, Wenxuan</creatorcontrib><creatorcontrib>Liu, Jun S.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Ping</au><au>Castillo-Davis, Cristian I.</au><au>Zhong, Wenxuan</au><au>Liu, Jun S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A data-driven clustering method for time course gene expression data</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2006-01-01</date><risdate>2006</risdate><volume>34</volume><issue>4</issue><spage>1261</spage><epage>1269</epage><pages>1261-1269</pages><issn>0305-1048</issn><issn>1362-4962</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>16510852</pmid><doi>10.1093/nar/gkl013</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0305-1048
ispartof	Nucleic acids research, 2006-01, Vol.34 (4), p.1261-1269
issn	0305-1048 1362-4962 1362-4962
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1388097
source	Open Access: PubMed Central; MEDLINE; DOAJ Directory of Open Access Journals; Oxford Journals Open Access Collection; Free Full-Text Journals in Chemistry
subjects	Algorithms Animals Caenorhabditis elegans - embryology Caenorhabditis elegans - genetics Caenorhabditis elegans - growth & development Cluster Analysis Computer Simulation Drosophila melanogaster - embryology Drosophila melanogaster - genetics Drosophila melanogaster - growth & development Gene Expression Gene Expression Profiling - methods Internet Kinetics Models, Statistical Oligonucleotide Array Sequence Analysis - methods Software
title	A data-driven clustering method for time course gene expression data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T15%3A14%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20data-driven%20clustering%20method%20for%20time%20course%20gene%20expression%20data&rft.jtitle=Nucleic%20acids%20research&rft.au=Ma,%20Ping&rft.date=2006-01-01&rft.volume=34&rft.issue=4&rft.spage=1261&rft.epage=1269&rft.pages=1261-1269&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gkl013&rft_dat=%3Cproquest_pubme%3E67707531%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200618847&rft_id=info:pmid/16510852&rfr_iscdi=true