A data-driven clustering method for time course gene expression data
Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that shar...
Gespeichert in:
Veröffentlicht in: | Nucleic acids research 2006-01, Vol.34 (4), p.1261-1269 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1269 |
---|---|
container_issue | 4 |
container_start_page | 1261 |
container_title | Nucleic acids research |
container_volume | 34 |
creator | Ma, Ping Castillo-Davis, Cristian I. Zhong, Wenxuan Liu, Jun S. |
description | Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html). |
doi_str_mv | 10.1093/nar/gkl013 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1388097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>67707531</sourcerecordid><originalsourceid>FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</originalsourceid><addsrcrecordid>eNpdkU1PFTEUhhsjkQu68QeYiQsWJAM9_ZrOxoSgl0skccPCuGlK58ylMNNe2xmC_97ivUFl1TTn6dv35CHkPdAToC0_DTadru8HCvwVWQBXrBatYq_JgnIqa6BC75ODnO8oBQFSvCH7oCRQLdmCfD6rOjvZukv-AUPlhjlPmHxYVyNOt7Gr-piqyY9YuTinjNUaA1b4uEmYs4_hz-u3ZK-3Q8Z3u_OQXC-_XJ-v6qtvF5fnZ1e1EwKmuvzpmMCedUwAY05xphwy6rC1Smvdyt51ylrb9iDLRd_orqesAyGsti0_JJ-2sZv5ZsTOYZiSHcwm-dGmXyZab_6fBH9r1vHBANeatk0JONoFpPhzxjyZ0WeHw2ADxjkb1TS0kRwK-PEFeFe2D2U3wyhVoLV4SjveQi7FnBP2z02AmicxpogxWzEF_vBv97_ozkQB6i3gi4DH57lN96UWb6RZff9h5MVSfRViaVb8N_sTmZ4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200618847</pqid></control><display><type>article</type><title>A data-driven clustering method for time course gene expression data</title><source>Open Access: PubMed Central</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Oxford Journals Open Access Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</creator><creatorcontrib>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</creatorcontrib><description>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</description><identifier>ISSN: 0305-1048</identifier><identifier>ISSN: 1362-4962</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkl013</identifier><identifier>PMID: 16510852</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Animals ; Caenorhabditis elegans - embryology ; Caenorhabditis elegans - genetics ; Caenorhabditis elegans - growth & development ; Cluster Analysis ; Computer Simulation ; Drosophila melanogaster - embryology ; Drosophila melanogaster - genetics ; Drosophila melanogaster - growth & development ; Gene Expression ; Gene Expression Profiling - methods ; Internet ; Kinetics ; Models, Statistical ; Oligonucleotide Array Sequence Analysis - methods ; Software</subject><ispartof>Nucleic acids research, 2006-01, Vol.34 (4), p.1261-1269</ispartof><rights>Copyright Oxford University Press(England) 2006</rights><rights>The Author 2006. Published by Oxford University Press. All rights reserved 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1388097/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1388097/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16510852$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ma, Ping</creatorcontrib><creatorcontrib>Castillo-Davis, Cristian I.</creatorcontrib><creatorcontrib>Zhong, Wenxuan</creatorcontrib><creatorcontrib>Liu, Jun S.</creatorcontrib><title>A data-driven clustering method for time course gene expression data</title><title>Nucleic acids research</title><addtitle>Nucl. Acids Res</addtitle><description>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</description><subject>Algorithms</subject><subject>Animals</subject><subject>Caenorhabditis elegans - embryology</subject><subject>Caenorhabditis elegans - genetics</subject><subject>Caenorhabditis elegans - growth & development</subject><subject>Cluster Analysis</subject><subject>Computer Simulation</subject><subject>Drosophila melanogaster - embryology</subject><subject>Drosophila melanogaster - genetics</subject><subject>Drosophila melanogaster - growth & development</subject><subject>Gene Expression</subject><subject>Gene Expression Profiling - methods</subject><subject>Internet</subject><subject>Kinetics</subject><subject>Models, Statistical</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Software</subject><issn>0305-1048</issn><issn>1362-4962</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkU1PFTEUhhsjkQu68QeYiQsWJAM9_ZrOxoSgl0skccPCuGlK58ylMNNe2xmC_97ivUFl1TTn6dv35CHkPdAToC0_DTadru8HCvwVWQBXrBatYq_JgnIqa6BC75ODnO8oBQFSvCH7oCRQLdmCfD6rOjvZukv-AUPlhjlPmHxYVyNOt7Gr-piqyY9YuTinjNUaA1b4uEmYs4_hz-u3ZK-3Q8Z3u_OQXC-_XJ-v6qtvF5fnZ1e1EwKmuvzpmMCedUwAY05xphwy6rC1Smvdyt51ylrb9iDLRd_orqesAyGsti0_JJ-2sZv5ZsTOYZiSHcwm-dGmXyZab_6fBH9r1vHBANeatk0JONoFpPhzxjyZ0WeHw2ADxjkb1TS0kRwK-PEFeFe2D2U3wyhVoLV4SjveQi7FnBP2z02AmicxpogxWzEF_vBv97_ozkQB6i3gi4DH57lN96UWb6RZff9h5MVSfRViaVb8N_sTmZ4</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Ma, Ping</creator><creator>Castillo-Davis, Cristian I.</creator><creator>Zhong, Wenxuan</creator><creator>Liu, Jun S.</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20060101</creationdate><title>A data-driven clustering method for time course gene expression data</title><author>Ma, Ping ; Castillo-Davis, Cristian I. ; Zhong, Wenxuan ; Liu, Jun S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c441t-510c24ef2d24122c6326ce20ce9a688895fcd6aaa9f1595f8b8df02d144a8a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Caenorhabditis elegans - embryology</topic><topic>Caenorhabditis elegans - genetics</topic><topic>Caenorhabditis elegans - growth & development</topic><topic>Cluster Analysis</topic><topic>Computer Simulation</topic><topic>Drosophila melanogaster - embryology</topic><topic>Drosophila melanogaster - genetics</topic><topic>Drosophila melanogaster - growth & development</topic><topic>Gene Expression</topic><topic>Gene Expression Profiling - methods</topic><topic>Internet</topic><topic>Kinetics</topic><topic>Models, Statistical</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, Ping</creatorcontrib><creatorcontrib>Castillo-Davis, Cristian I.</creatorcontrib><creatorcontrib>Zhong, Wenxuan</creatorcontrib><creatorcontrib>Liu, Jun S.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Ping</au><au>Castillo-Davis, Cristian I.</au><au>Zhong, Wenxuan</au><au>Liu, Jun S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A data-driven clustering method for time course gene expression data</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucl. Acids Res</addtitle><date>2006-01-01</date><risdate>2006</risdate><volume>34</volume><issue>4</issue><spage>1261</spage><epage>1269</epage><pages>1261-1269</pages><issn>0305-1048</issn><issn>1362-4962</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html).</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>16510852</pmid><doi>10.1093/nar/gkl013</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0305-1048 |
ispartof | Nucleic acids research, 2006-01, Vol.34 (4), p.1261-1269 |
issn | 0305-1048 1362-4962 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_1388097 |
source | Open Access: PubMed Central; MEDLINE; DOAJ Directory of Open Access Journals; Oxford Journals Open Access Collection; Free Full-Text Journals in Chemistry |
subjects | Algorithms Animals Caenorhabditis elegans - embryology Caenorhabditis elegans - genetics Caenorhabditis elegans - growth & development Cluster Analysis Computer Simulation Drosophila melanogaster - embryology Drosophila melanogaster - genetics Drosophila melanogaster - growth & development Gene Expression Gene Expression Profiling - methods Internet Kinetics Models, Statistical Oligonucleotide Array Sequence Analysis - methods Software |
title | A data-driven clustering method for time course gene expression data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T15%3A14%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20data-driven%20clustering%20method%20for%20time%20course%20gene%20expression%20data&rft.jtitle=Nucleic%20acids%20research&rft.au=Ma,%20Ping&rft.date=2006-01-01&rft.volume=34&rft.issue=4&rft.spage=1261&rft.epage=1269&rft.pages=1261-1269&rft.issn=0305-1048&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/gkl013&rft_dat=%3Cproquest_pubme%3E67707531%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200618847&rft_id=info:pmid/16510852&rfr_iscdi=true |