Sparse group factor analysis for biclustering of multiple data sources

Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data sourc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2016-04
Hauptverfasser:	Bunte, Kerstin, Leppäaho, Eemeli, Saarinen, Inka, Kaski, Samuel
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayesian analysis Computer Science - Information Retrieval Computer Science - Learning Computer simulation Data sources Deoxyribonucleic acid DNA DNA methylation Factor analysis Gene expression Proteins Sensitivity Statistics - Machine Learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Bunte, Kerstin Leppäaho, Eemeli Saarinen, Inka Kaski, Samuel
description	Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis (GFA) to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. Results: Our simulation studies show that the proposed method reliably infers bi-clusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity.
doi_str_mv	10.48550/arxiv.1512.08808
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_1512_08808</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2080900074</sourcerecordid><originalsourceid>FETCH-LOGICAL-a524-817f2183d68e87a155dc8aa2c0e8234ca773cbfbc9a8115df6ca92f2bc553e53</originalsourceid><addsrcrecordid>eNotz1FLwzAUBeAgCI65H-CTAZ9bk5umuXuU4VQY-DDfy22ajIxurUkr7t9btz1dLhwO52PsQYq8QK3FM8Xf8JNLLSEXiAJv2AyUkhkWAHdskdJeCAGlAa3VjK23PcXk-C52Y8892aGLnI7UnlJI3E9PHWw7psHFcNzxzvPD2A6hbx1vaCCeujFal-7Zrac2ucX1ztl2_fq1es82n28fq5dNRhqKDKXxIFE1JTo0JLVuLBKBFQ5BFZaMUbb2tV0SSqkbX1pagofaTludVnP2eGk9G6s-hgPFU_Vvrc7WKfF0SfSx-x5dGqr9NHDipAoEiuVEN4X6A5fCV6E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2080900074</pqid></control><display><type>article</type><title>Sparse group factor analysis for biclustering of multiple data sources</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Bunte, Kerstin ; Leppäaho, Eemeli ; Saarinen, Inka ; Kaski, Samuel</creator><creatorcontrib>Bunte, Kerstin ; Leppäaho, Eemeli ; Saarinen, Inka ; Kaski, Samuel</creatorcontrib><description>Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis (GFA) to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. Results: Our simulation studies show that the proposed method reliably infers bi-clusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.1512.08808</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Bayesian analysis ; Computer Science - Information Retrieval ; Computer Science - Learning ; Computer simulation ; Data sources ; Deoxyribonucleic acid ; DNA ; DNA methylation ; Factor analysis ; Gene expression ; Proteins ; Sensitivity ; Statistics - Machine Learning</subject><ispartof>arXiv.org, 2016-04</ispartof><rights>2016. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27916</link.rule.ids><backlink>$$Uhttps://doi.org/10.1093/bioinformatics/btw207$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.1512.08808$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bunte, Kerstin</creatorcontrib><creatorcontrib>Leppäaho, Eemeli</creatorcontrib><creatorcontrib>Saarinen, Inka</creatorcontrib><creatorcontrib>Kaski, Samuel</creatorcontrib><title>Sparse group factor analysis for biclustering of multiple data sources</title><title>arXiv.org</title><description>Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis (GFA) to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. Results: Our simulation studies show that the proposed method reliably infers bi-clusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity.</description><subject>Bayesian analysis</subject><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Learning</subject><subject>Computer simulation</subject><subject>Data sources</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA methylation</subject><subject>Factor analysis</subject><subject>Gene expression</subject><subject>Proteins</subject><subject>Sensitivity</subject><subject>Statistics - Machine Learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotz1FLwzAUBeAgCI65H-CTAZ9bk5umuXuU4VQY-DDfy22ajIxurUkr7t9btz1dLhwO52PsQYq8QK3FM8Xf8JNLLSEXiAJv2AyUkhkWAHdskdJeCAGlAa3VjK23PcXk-C52Y8892aGLnI7UnlJI3E9PHWw7psHFcNzxzvPD2A6hbx1vaCCeujFal-7Zrac2ucX1ztl2_fq1es82n28fq5dNRhqKDKXxIFE1JTo0JLVuLBKBFQ5BFZaMUbb2tV0SSqkbX1pagofaTludVnP2eGk9G6s-hgPFU_Vvrc7WKfF0SfSx-x5dGqr9NHDipAoEiuVEN4X6A5fCV6E</recordid><startdate>20160421</startdate><enddate>20160421</enddate><creator>Bunte, Kerstin</creator><creator>Leppäaho, Eemeli</creator><creator>Saarinen, Inka</creator><creator>Kaski, Samuel</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20160421</creationdate><title>Sparse group factor analysis for biclustering of multiple data sources</title><author>Bunte, Kerstin ; Leppäaho, Eemeli ; Saarinen, Inka ; Kaski, Samuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a524-817f2183d68e87a155dc8aa2c0e8234ca773cbfbc9a8115df6ca92f2bc553e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Bayesian analysis</topic><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Learning</topic><topic>Computer simulation</topic><topic>Data sources</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA methylation</topic><topic>Factor analysis</topic><topic>Gene expression</topic><topic>Proteins</topic><topic>Sensitivity</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Bunte, Kerstin</creatorcontrib><creatorcontrib>Leppäaho, Eemeli</creatorcontrib><creatorcontrib>Saarinen, Inka</creatorcontrib><creatorcontrib>Kaski, Samuel</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bunte, Kerstin</au><au>Leppäaho, Eemeli</au><au>Saarinen, Inka</au><au>Kaski, Samuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sparse group factor analysis for biclustering of multiple data sources</atitle><jtitle>arXiv.org</jtitle><date>2016-04-21</date><risdate>2016</risdate><eissn>2331-8422</eissn><abstract>Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis (GFA) to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. Results: Our simulation studies show that the proposed method reliably infers bi-clusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.1512.08808</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2016-04
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_1512_08808
source	arXiv.org; Free E- Journals
subjects	Bayesian analysis Computer Science - Information Retrieval Computer Science - Learning Computer simulation Data sources Deoxyribonucleic acid DNA DNA methylation Factor analysis Gene expression Proteins Sensitivity Statistics - Machine Learning
title	Sparse group factor analysis for biclustering of multiple data sources
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T02%3A47%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sparse%20group%20factor%20analysis%20for%20biclustering%20of%20multiple%20data%20sources&rft.jtitle=arXiv.org&rft.au=Bunte,%20Kerstin&rft.date=2016-04-21&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.1512.08808&rft_dat=%3Cproquest_arxiv%3E2080900074%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2080900074&rft_id=info:pmid/&rfr_iscdi=true