Clustering short time series gene expression data

Motivation: Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8 time points or fewer). These datasets present unique challenges. On account of the large number of genes profiled (often tens of thou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2005-06, Vol.21 (suppl-1), p.i159-i168
Hauptverfasser:	Ernst, Jason, Nau, Gerard J., Bar-Joseph, Ziv
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cell Line, Tumor Cluster Analysis Computational Biology - methods Computer Simulation Gene Expression Profiling Gene Expression Regulation Helicobacter pylori - metabolism Humans Immune System Internet Models, Theoretical Neoplasms - microbiology Oligonucleotide Array Sequence Analysis Programming Languages Software Time Factors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	i168
container_issue	suppl-1
container_start_page	i159
container_title	Bioinformatics
container_volume	21
creator	Ernst, Jason Nau, Gerard J. Bar-Joseph, Ziv
description	Motivation: Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8 time points or fewer). These datasets present unique challenges. On account of the large number of genes profiled (often tens of thousands) and the small number of time points many patterns are expected to arise at random. Most clustering algorithms are unable to distinguish between real and random patterns. Results: We present an algorithm specifically designed for clustering short time series expression data. Our algorithm works by assigning genes to a predefined set of model profiles that capture the potential distinct patterns that can be expected from the experiment. We discuss how to obtain such a set of profiles and how to determine the significance of each of these profiles. Significant profiles are retained for further analysis and can be combined to form clusters. We tested our method on both simulated and real biological data. Using immune response data we show that our algorithm can correctly detect the temporal profile of relevant functional categories. Using Gene Ontology analysis we show that our algorithm outperforms both general clustering algorithms and algorithms designed specifically for clustering time series gene expression data. Availability: Information on obtaining a Java implementation with a graphical user interface (GUI) is available from http://www.cs.cmu.edu/~jernst/st/ Contact: jernst@cs.cmu.edu Supplementary information: Available at http://www.cs.cmu.edu/~jernst/st/
doi_str_mv	10.1093/bioinformatics/bti1022
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_67946752</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>17577079</sourcerecordid><originalsourceid>FETCH-LOGICAL-c550t-6c266d16ada670627d68fcaab7d77d58f286419aa36d2feaf458d577084198f33</originalsourceid><addsrcrecordid>eNqFkU1LAzEQhoMoVqt_oSwevK3Nd3aPUrSVVjyoIL2E7CZbU_ejJrtQ_70pXRS99DTD5Jk3DA8AIwRvEEzJOLONrYvGVaq1uR9nrUUQ4yNwhiiHMYYsPQ494SKmCSQDcO79GkKGKKWnYIBYyhFl5AygSdn51jhbryL_3rg2am1lIh8mxkcrU5vIbDfOeG-bOtKqVRfgpFClN5d9HYLX-7uXySxePE0fJreLOGcMtjHPMecacaUVF5BjoXlS5EplQguhWVLghFOUKkW4xoVRBWWJZkLAJEyTgpAhuN7nblzz2Rnfysr63JSlqk3TeclFSrlg-CAY4hJGKD0Mit3_Ig3g1T9w3XSuDtfuwrjAGMEA8T2Uu8Z7Zwq5cbZS7ksiKHeO5F9HsncUFkd9epdVRv-u9VICEO8BG8xsf96V-whHE8Hk7G0p6ez5cTqfT-SSfAOoEaCF</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198672210</pqid></control><display><type>article</type><title>Clustering short time series gene expression data</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>EZB-FREE-00999 freely available EZB journals</source><source>Alma/SFX Local Collection</source><creator>Ernst, Jason ; Nau, Gerard J. ; Bar-Joseph, Ziv</creator><creatorcontrib>Ernst, Jason ; Nau, Gerard J. ; Bar-Joseph, Ziv</creatorcontrib><description>Motivation: Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8 time points or fewer). These datasets present unique challenges. On account of the large number of genes profiled (often tens of thousands) and the small number of time points many patterns are expected to arise at random. Most clustering algorithms are unable to distinguish between real and random patterns. Results: We present an algorithm specifically designed for clustering short time series expression data. Our algorithm works by assigning genes to a predefined set of model profiles that capture the potential distinct patterns that can be expected from the experiment. We discuss how to obtain such a set of profiles and how to determine the significance of each of these profiles. Significant profiles are retained for further analysis and can be combined to form clusters. We tested our method on both simulated and real biological data. Using immune response data we show that our algorithm can correctly detect the temporal profile of relevant functional categories. Using Gene Ontology analysis we show that our algorithm outperforms both general clustering algorithms and algorithms designed specifically for clustering time series gene expression data. Availability: Information on obtaining a Java implementation with a graphical user interface (GUI) is available from http://www.cs.cmu.edu/~jernst/st/ Contact: jernst@cs.cmu.edu Supplementary information: Available at http://www.cs.cmu.edu/~jernst/st/</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bti1022</identifier><identifier>PMID: 15961453</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Cell Line, Tumor ; Cluster Analysis ; Computational Biology - methods ; Computer Simulation ; Gene Expression Profiling ; Gene Expression Regulation ; Helicobacter pylori - metabolism ; Humans ; Immune System ; Internet ; Models, Theoretical ; Neoplasms - microbiology ; Oligonucleotide Array Sequence Analysis ; Programming Languages ; Software ; Time Factors</subject><ispartof>Bioinformatics, 2005-06, Vol.21 (suppl-1), p.i159-i168</ispartof><rights>Copyright Oxford University Press(England) Jun 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c550t-6c266d16ada670627d68fcaab7d77d58f286419aa36d2feaf458d577084198f33</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15961453$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ernst, Jason</creatorcontrib><creatorcontrib>Nau, Gerard J.</creatorcontrib><creatorcontrib>Bar-Joseph, Ziv</creatorcontrib><title>Clustering short time series gene expression data</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8 time points or fewer). These datasets present unique challenges. On account of the large number of genes profiled (often tens of thousands) and the small number of time points many patterns are expected to arise at random. Most clustering algorithms are unable to distinguish between real and random patterns. Results: We present an algorithm specifically designed for clustering short time series expression data. Our algorithm works by assigning genes to a predefined set of model profiles that capture the potential distinct patterns that can be expected from the experiment. We discuss how to obtain such a set of profiles and how to determine the significance of each of these profiles. Significant profiles are retained for further analysis and can be combined to form clusters. We tested our method on both simulated and real biological data. Using immune response data we show that our algorithm can correctly detect the temporal profile of relevant functional categories. Using Gene Ontology analysis we show that our algorithm outperforms both general clustering algorithms and algorithms designed specifically for clustering time series gene expression data. Availability: Information on obtaining a Java implementation with a graphical user interface (GUI) is available from http://www.cs.cmu.edu/~jernst/st/ Contact: jernst@cs.cmu.edu Supplementary information: Available at http://www.cs.cmu.edu/~jernst/st/</description><subject>Algorithms</subject><subject>Cell Line, Tumor</subject><subject>Cluster Analysis</subject><subject>Computational Biology - methods</subject><subject>Computer Simulation</subject><subject>Gene Expression Profiling</subject><subject>Gene Expression Regulation</subject><subject>Helicobacter pylori - metabolism</subject><subject>Humans</subject><subject>Immune System</subject><subject>Internet</subject><subject>Models, Theoretical</subject><subject>Neoplasms - microbiology</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Programming Languages</subject><subject>Software</subject><subject>Time Factors</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkU1LAzEQhoMoVqt_oSwevK3Nd3aPUrSVVjyoIL2E7CZbU_ejJrtQ_70pXRS99DTD5Jk3DA8AIwRvEEzJOLONrYvGVaq1uR9nrUUQ4yNwhiiHMYYsPQ494SKmCSQDcO79GkKGKKWnYIBYyhFl5AygSdn51jhbryL_3rg2am1lIh8mxkcrU5vIbDfOeG-bOtKqVRfgpFClN5d9HYLX-7uXySxePE0fJreLOGcMtjHPMecacaUVF5BjoXlS5EplQguhWVLghFOUKkW4xoVRBWWJZkLAJEyTgpAhuN7nblzz2Rnfysr63JSlqk3TeclFSrlg-CAY4hJGKD0Mit3_Ig3g1T9w3XSuDtfuwrjAGMEA8T2Uu8Z7Zwq5cbZS7ksiKHeO5F9HsncUFkd9epdVRv-u9VICEO8BG8xsf96V-whHE8Hk7G0p6ez5cTqfT-SSfAOoEaCF</recordid><startdate>20050601</startdate><enddate>20050601</enddate><creator>Ernst, Jason</creator><creator>Nau, Gerard J.</creator><creator>Bar-Joseph, Ziv</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20050601</creationdate><title>Clustering short time series gene expression data</title><author>Ernst, Jason ; Nau, Gerard J. ; Bar-Joseph, Ziv</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c550t-6c266d16ada670627d68fcaab7d77d58f286419aa36d2feaf458d577084198f33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>Cell Line, Tumor</topic><topic>Cluster Analysis</topic><topic>Computational Biology - methods</topic><topic>Computer Simulation</topic><topic>Gene Expression Profiling</topic><topic>Gene Expression Regulation</topic><topic>Helicobacter pylori - metabolism</topic><topic>Humans</topic><topic>Immune System</topic><topic>Internet</topic><topic>Models, Theoretical</topic><topic>Neoplasms - microbiology</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Programming Languages</topic><topic>Software</topic><topic>Time Factors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ernst, Jason</creatorcontrib><creatorcontrib>Nau, Gerard J.</creatorcontrib><creatorcontrib>Bar-Joseph, Ziv</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ernst, Jason</au><au>Nau, Gerard J.</au><au>Bar-Joseph, Ziv</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Clustering short time series gene expression data</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2005-06-01</date><risdate>2005</risdate><volume>21</volume><issue>suppl-1</issue><spage>i159</spage><epage>i168</epage><pages>i159-i168</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8 time points or fewer). These datasets present unique challenges. On account of the large number of genes profiled (often tens of thousands) and the small number of time points many patterns are expected to arise at random. Most clustering algorithms are unable to distinguish between real and random patterns. Results: We present an algorithm specifically designed for clustering short time series expression data. Our algorithm works by assigning genes to a predefined set of model profiles that capture the potential distinct patterns that can be expected from the experiment. We discuss how to obtain such a set of profiles and how to determine the significance of each of these profiles. Significant profiles are retained for further analysis and can be combined to form clusters. We tested our method on both simulated and real biological data. Using immune response data we show that our algorithm can correctly detect the temporal profile of relevant functional categories. Using Gene Ontology analysis we show that our algorithm outperforms both general clustering algorithms and algorithms designed specifically for clustering time series gene expression data. Availability: Information on obtaining a Java implementation with a graphical user interface (GUI) is available from http://www.cs.cmu.edu/~jernst/st/ Contact: jernst@cs.cmu.edu Supplementary information: Available at http://www.cs.cmu.edu/~jernst/st/</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>15961453</pmid><doi>10.1093/bioinformatics/bti1022</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1367-4803
ispartof	Bioinformatics, 2005-06, Vol.21 (suppl-1), p.i159-i168
issn	1367-4803 1460-2059 1367-4811
language	eng
recordid	cdi_proquest_miscellaneous_67946752
source	MEDLINE; Oxford Journals Open Access Collection; EZB-FREE-00999 freely available EZB journals; Alma/SFX Local Collection
subjects	Algorithms Cell Line, Tumor Cluster Analysis Computational Biology - methods Computer Simulation Gene Expression Profiling Gene Expression Regulation Helicobacter pylori - metabolism Humans Immune System Internet Models, Theoretical Neoplasms - microbiology Oligonucleotide Array Sequence Analysis Programming Languages Software Time Factors
title	Clustering short time series gene expression data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T04%3A07%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Clustering%20short%20time%20series%20gene%20expression%20data&rft.jtitle=Bioinformatics&rft.au=Ernst,%20Jason&rft.date=2005-06-01&rft.volume=21&rft.issue=suppl-1&rft.spage=i159&rft.epage=i168&rft.pages=i159-i168&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/bti1022&rft_dat=%3Cproquest_cross%3E17577079%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198672210&rft_id=info:pmid/15961453&rfr_iscdi=true