Interpolation based consensus clustering for gene expression time series

Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BMC bioinformatics 2015-04, Vol.16 (1), p.117-117, Article 117
Hauptverfasser:	Chiu, Tai-Yu, Hsu, Ting-Chieh, Yen, Chia-Cheng, Wang, Jia-Shung
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Cell Cycle - physiology Cluster Analysis Computer Graphics Consensus Sequence Galactose - metabolism Gene expression Gene Expression Profiling - methods Gene Expression Regulation, Fungal Information management Oligonucleotide Array Sequence Analysis - methods Saccharomyces cerevisiae - genetics Saccharomyces cerevisiae Proteins - genetics Spores, Fungal - physiology Time Factors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	117
container_issue	1
container_start_page	117
container_title	BMC bioinformatics
container_volume	16
creator	Chiu, Tai-Yu Hsu, Ting-Chieh Yen, Chia-Cheng Wang, Jia-Shung
description	Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.
doi_str_mv	10.1186/s12859-015-0541-0
format	Article
fullrecord	<record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4407314</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A541357984</galeid><sourcerecordid>A541357984</sourcerecordid><originalsourceid>FETCH-LOGICAL-c500t-5051d7a3a82eae385ed7a0b75b21a6b0cf24ab9a2c0ca9fdc091218108d9c81f3</originalsourceid><addsrcrecordid>eNptkktr3TAQhUVpaR7tD-imGLppFk5nbOta3hRCSJoLgUDaroUsj10VW7rV2CX995W5aciFSgs95jsHNDpCvEM4R1SbT4yFkk0OKHOQFebwQhxjVWNeIMiXz_ZH4oT5JwDWCuRrcVRIpRRgcyxutn6muAujmV3wWWuYuswGz-R54cyOC6e680PWh5gN5Cmjh10k5hWf3UQZpzrxG_GqNyPT28f1VHy_vvp2eZPf3n3ZXl7c5lYCzLkEiV1tSqMKMlQqSekEbS3bAs2mBdsXlWkbU1iwpuk7Cw0WqBBU11iFfXkqPu99d0s7UWfJz9GMehfdZOIfHYzThxXvfugh_NZVBXWJVTL4-GgQw6-FeNaTY0vjaDyFhTVuaqk2TRoJ_bBHBzOSdr4PydGuuL5I_S5l3ajV8Pw_VJodTS61knqX7g8EZweCxMz0MA9mYdbbr_eHLO5ZGwNzpP7ppQh6DYHeh0CnEOg1BBqS5v3zFj0p_v16-RehP60B</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1675869999</pqid></control><display><type>article</type><title>Interpolation based consensus clustering for gene expression time series</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>SpringerOpen (Open Access)</source><source>SpringerLink (Online service)</source><source>PubMed Central</source><source>EZB Electronic Journals Library</source><source>PubMed Central Open Access</source><creator>Chiu, Tai-Yu ; Hsu, Ting-Chieh ; Yen, Chia-Cheng ; Wang, Jia-Shung</creator><creatorcontrib>Chiu, Tai-Yu ; Hsu, Ting-Chieh ; Yen, Chia-Cheng ; Wang, Jia-Shung</creatorcontrib><description>Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-015-0541-0</identifier><identifier>PMID: 25888019</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Algorithms ; Analysis ; Cell Cycle - physiology ; Cluster Analysis ; Computer Graphics ; Consensus Sequence ; Galactose - metabolism ; Gene expression ; Gene Expression Profiling - methods ; Gene Expression Regulation, Fungal ; Information management ; Oligonucleotide Array Sequence Analysis - methods ; Saccharomyces cerevisiae - genetics ; Saccharomyces cerevisiae Proteins - genetics ; Spores, Fungal - physiology ; Time Factors</subject><ispartof>BMC bioinformatics, 2015-04, Vol.16 (1), p.117-117, Article 117</ispartof><rights>COPYRIGHT 2015 BioMed Central Ltd.</rights><rights>Chiu et al.; licensee BioMed Central. 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c500t-5051d7a3a82eae385ed7a0b75b21a6b0cf24ab9a2c0ca9fdc091218108d9c81f3</citedby><cites>FETCH-LOGICAL-c500t-5051d7a3a82eae385ed7a0b75b21a6b0cf24ab9a2c0ca9fdc091218108d9c81f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4407314/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4407314/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25888019$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Chiu, Tai-Yu</creatorcontrib><creatorcontrib>Hsu, Ting-Chieh</creatorcontrib><creatorcontrib>Yen, Chia-Cheng</creatorcontrib><creatorcontrib>Wang, Jia-Shung</creatorcontrib><title>Interpolation based consensus clustering for gene expression time series</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Cell Cycle - physiology</subject><subject>Cluster Analysis</subject><subject>Computer Graphics</subject><subject>Consensus Sequence</subject><subject>Galactose - metabolism</subject><subject>Gene expression</subject><subject>Gene Expression Profiling - methods</subject><subject>Gene Expression Regulation, Fungal</subject><subject>Information management</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Saccharomyces cerevisiae - genetics</subject><subject>Saccharomyces cerevisiae Proteins - genetics</subject><subject>Spores, Fungal - physiology</subject><subject>Time Factors</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNptkktr3TAQhUVpaR7tD-imGLppFk5nbOta3hRCSJoLgUDaroUsj10VW7rV2CX995W5aciFSgs95jsHNDpCvEM4R1SbT4yFkk0OKHOQFebwQhxjVWNeIMiXz_ZH4oT5JwDWCuRrcVRIpRRgcyxutn6muAujmV3wWWuYuswGz-R54cyOC6e680PWh5gN5Cmjh10k5hWf3UQZpzrxG_GqNyPT28f1VHy_vvp2eZPf3n3ZXl7c5lYCzLkEiV1tSqMKMlQqSekEbS3bAs2mBdsXlWkbU1iwpuk7Cw0WqBBU11iFfXkqPu99d0s7UWfJz9GMehfdZOIfHYzThxXvfugh_NZVBXWJVTL4-GgQw6-FeNaTY0vjaDyFhTVuaqk2TRoJ_bBHBzOSdr4PydGuuL5I_S5l3ajV8Pw_VJodTS61knqX7g8EZweCxMz0MA9mYdbbr_eHLO5ZGwNzpP7ppQh6DYHeh0CnEOg1BBqS5v3zFj0p_v16-RehP60B</recordid><startdate>20150416</startdate><enddate>20150416</enddate><creator>Chiu, Tai-Yu</creator><creator>Hsu, Ting-Chieh</creator><creator>Yen, Chia-Cheng</creator><creator>Wang, Jia-Shung</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20150416</creationdate><title>Interpolation based consensus clustering for gene expression time series</title><author>Chiu, Tai-Yu ; Hsu, Ting-Chieh ; Yen, Chia-Cheng ; Wang, Jia-Shung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c500t-5051d7a3a82eae385ed7a0b75b21a6b0cf24ab9a2c0ca9fdc091218108d9c81f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Cell Cycle - physiology</topic><topic>Cluster Analysis</topic><topic>Computer Graphics</topic><topic>Consensus Sequence</topic><topic>Galactose - metabolism</topic><topic>Gene expression</topic><topic>Gene Expression Profiling - methods</topic><topic>Gene Expression Regulation, Fungal</topic><topic>Information management</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Saccharomyces cerevisiae - genetics</topic><topic>Saccharomyces cerevisiae Proteins - genetics</topic><topic>Spores, Fungal - physiology</topic><topic>Time Factors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chiu, Tai-Yu</creatorcontrib><creatorcontrib>Hsu, Ting-Chieh</creatorcontrib><creatorcontrib>Yen, Chia-Cheng</creatorcontrib><creatorcontrib>Wang, Jia-Shung</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chiu, Tai-Yu</au><au>Hsu, Ting-Chieh</au><au>Yen, Chia-Cheng</au><au>Wang, Jia-Shung</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interpolation based consensus clustering for gene expression time series</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2015-04-16</date><risdate>2015</risdate><volume>16</volume><issue>1</issue><spage>117</spage><epage>117</epage><pages>117-117</pages><artnum>117</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>25888019</pmid><doi>10.1186/s12859-015-0541-0</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1471-2105
ispartof	BMC bioinformatics, 2015-04, Vol.16 (1), p.117-117, Article 117
issn	1471-2105 1471-2105
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4407314
source	MEDLINE; DOAJ Directory of Open Access Journals; SpringerOpen (Open Access); SpringerLink (Online service); PubMed Central; EZB Electronic Journals Library; PubMed Central Open Access
subjects	Algorithms Analysis Cell Cycle - physiology Cluster Analysis Computer Graphics Consensus Sequence Galactose - metabolism Gene expression Gene Expression Profiling - methods Gene Expression Regulation, Fungal Information management Oligonucleotide Array Sequence Analysis - methods Saccharomyces cerevisiae - genetics Saccharomyces cerevisiae Proteins - genetics Spores, Fungal - physiology Time Factors
title	Interpolation based consensus clustering for gene expression time series
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T09%3A20%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interpolation%20based%20consensus%20clustering%20for%20gene%20expression%20time%20series&rft.jtitle=BMC%20bioinformatics&rft.au=Chiu,%20Tai-Yu&rft.date=2015-04-16&rft.volume=16&rft.issue=1&rft.spage=117&rft.epage=117&rft.pages=117-117&rft.artnum=117&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-015-0541-0&rft_dat=%3Cgale_pubme%3EA541357984%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1675869999&rft_id=info:pmid/25888019&rft_galeid=A541357984&rfr_iscdi=true