Robust biclustering by sparse singular value decomposition incorporating stability selection

Motivation: Over the past decade, several biclustering approaches have been published in the field of gene expression data analysis. Despite of huge diversity regarding the mathematical concepts of the different biclustering methods, many of them can be related to the singular value decomposition (S...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2011-08, Vol.27 (15), p.2089-2097
Hauptverfasser: Sill, Martin, Kaiser, Sebastian, Benner, Axel, Kopp-Schneider, Annette
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2097
container_issue 15
container_start_page 2089
container_title Bioinformatics
container_volume 27
creator Sill, Martin
Kaiser, Sebastian
Benner, Axel
Kopp-Schneider, Annette
description Motivation: Over the past decade, several biclustering approaches have been published in the field of gene expression data analysis. Despite of huge diversity regarding the mathematical concepts of the different biclustering methods, many of them can be related to the singular value decomposition (SVD). Recently, a sparse SVD approach (SSVD) has been proposed to reveal biclusters in gene expression data. In this article, we propose to incorporate stability selection to improve this method. Stability selection is a subsampling-based variable selection that allows to control Type I error rates. The here proposed S4VD algorithm incorporates this subsampling approach to find stable biclusters, and to estimate the selection probabilities of genes and samples to belong to the biclusters. Results: So far, the S4VD method is the first biclustering approach that takes the cluster stability regarding perturbations of the data into account. Application of the S4VD algorithm to a lung cancer microarray dataset revealed biclusters that correspond to coregulated genes associated with cancer subtypes. Marker genes for different lung cancer subtypes showed high selection probabilities to belong to the corresponding biclusters. Moreover, the genes associated with the biclusters belong to significantly enriched cancer-related Gene Ontology categories. In a simulation study, the S4VD algorithm outperformed the SSVD algorithm and two other SVD-related biclustering methods in recovering artificial biclusters and in being robust to noisy data. Availability: R-Code of the S4VD algorithm as well as a documentation can be found at http://s4vd.r-forge.r-project.org/. Contact: m.sill@dkfz.de Supplementary information: Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btr322
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_878029011</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btr322</oup_id><sourcerecordid>878029011</sourcerecordid><originalsourceid>FETCH-LOGICAL-c379t-6f557ffe16542dde799fee16a203c6ba9b036fedc4c118fc64ca2bd33d4a865d3</originalsourceid><addsrcrecordid>eNqNkE1LxDAQhoMo7rr6E5RexFPdfDVtj7L4BYIgehNKkiYSSZuaaYX992bZVfEmc5gZeN55hxehU4IvCa7ZUrngehtiJ0enYanGyCjdQ3PCBc4pLur9NDNR5rzCbIaOAN4xLgjn_BDNKBFMFHU5R69PQU0wZsppn7qJrn_L1DqDQUYwGaR18jJmn9JPJmuNDt0QwI0u9JnrdYhDiOmDJIJRKufdmLTGG70hjtGBlR7Mya4v0MvN9fPqLn94vL1fXT3kmpX1mAtbFKW1hoiC07Y1ZV1bkzZJMdNCyVphJqxpNdeEVFYLriVVLWMtl5UoWrZAF9u7Qwwfk4Gx6Rxo473sTZigqcoK0xoTkshiS-oYAKKxzRBdJ-O6IbjZ5Nr8zbXZ5pp0ZzuHSXWm_VF9B5mA8x0gQUtvo-y1g1-Os1RllTi85cI0_NP7C1lbm2A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>878029011</pqid></control><display><type>article</type><title>Robust biclustering by sparse singular value decomposition incorporating stability selection</title><source>Oxford Journals Open Access Collection</source><creator>Sill, Martin ; Kaiser, Sebastian ; Benner, Axel ; Kopp-Schneider, Annette</creator><creatorcontrib>Sill, Martin ; Kaiser, Sebastian ; Benner, Axel ; Kopp-Schneider, Annette</creatorcontrib><description>Motivation: Over the past decade, several biclustering approaches have been published in the field of gene expression data analysis. Despite of huge diversity regarding the mathematical concepts of the different biclustering methods, many of them can be related to the singular value decomposition (SVD). Recently, a sparse SVD approach (SSVD) has been proposed to reveal biclusters in gene expression data. In this article, we propose to incorporate stability selection to improve this method. Stability selection is a subsampling-based variable selection that allows to control Type I error rates. The here proposed S4VD algorithm incorporates this subsampling approach to find stable biclusters, and to estimate the selection probabilities of genes and samples to belong to the biclusters. Results: So far, the S4VD method is the first biclustering approach that takes the cluster stability regarding perturbations of the data into account. Application of the S4VD algorithm to a lung cancer microarray dataset revealed biclusters that correspond to coregulated genes associated with cancer subtypes. Marker genes for different lung cancer subtypes showed high selection probabilities to belong to the corresponding biclusters. Moreover, the genes associated with the biclusters belong to significantly enriched cancer-related Gene Ontology categories. In a simulation study, the S4VD algorithm outperformed the SSVD algorithm and two other SVD-related biclustering methods in recovering artificial biclusters and in being robust to noisy data. Availability: R-Code of the S4VD algorithm as well as a documentation can be found at http://s4vd.r-forge.r-project.org/. Contact: m.sill@dkfz.de Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btr322</identifier><identifier>PMID: 21636597</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Biological and medical sciences ; Cluster Analysis ; Computational Biology - methods ; Computer Simulation ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Humans ; Lung Neoplasms - genetics ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Oligonucleotide Array Sequence Analysis</subject><ispartof>Bioinformatics, 2011-08, Vol.27 (15), p.2089-2097</ispartof><rights>The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2011</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c379t-6f557ffe16542dde799fee16a203c6ba9b036fedc4c118fc64ca2bd33d4a865d3</citedby><cites>FETCH-LOGICAL-c379t-6f557ffe16542dde799fee16a203c6ba9b036fedc4c118fc64ca2bd33d4a865d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1604,27924,27925</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btr322$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=24343478$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21636597$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sill, Martin</creatorcontrib><creatorcontrib>Kaiser, Sebastian</creatorcontrib><creatorcontrib>Benner, Axel</creatorcontrib><creatorcontrib>Kopp-Schneider, Annette</creatorcontrib><title>Robust biclustering by sparse singular value decomposition incorporating stability selection</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Over the past decade, several biclustering approaches have been published in the field of gene expression data analysis. Despite of huge diversity regarding the mathematical concepts of the different biclustering methods, many of them can be related to the singular value decomposition (SVD). Recently, a sparse SVD approach (SSVD) has been proposed to reveal biclusters in gene expression data. In this article, we propose to incorporate stability selection to improve this method. Stability selection is a subsampling-based variable selection that allows to control Type I error rates. The here proposed S4VD algorithm incorporates this subsampling approach to find stable biclusters, and to estimate the selection probabilities of genes and samples to belong to the biclusters. Results: So far, the S4VD method is the first biclustering approach that takes the cluster stability regarding perturbations of the data into account. Application of the S4VD algorithm to a lung cancer microarray dataset revealed biclusters that correspond to coregulated genes associated with cancer subtypes. Marker genes for different lung cancer subtypes showed high selection probabilities to belong to the corresponding biclusters. Moreover, the genes associated with the biclusters belong to significantly enriched cancer-related Gene Ontology categories. In a simulation study, the S4VD algorithm outperformed the SSVD algorithm and two other SVD-related biclustering methods in recovering artificial biclusters and in being robust to noisy data. Availability: R-Code of the S4VD algorithm as well as a documentation can be found at http://s4vd.r-forge.r-project.org/. Contact: m.sill@dkfz.de Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>Algorithms</subject><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Computational Biology - methods</subject><subject>Computer Simulation</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Humans</subject><subject>Lung Neoplasms - genetics</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Oligonucleotide Array Sequence Analysis</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkE1LxDAQhoMo7rr6E5RexFPdfDVtj7L4BYIgehNKkiYSSZuaaYX992bZVfEmc5gZeN55hxehU4IvCa7ZUrngehtiJ0enYanGyCjdQ3PCBc4pLur9NDNR5rzCbIaOAN4xLgjn_BDNKBFMFHU5R69PQU0wZsppn7qJrn_L1DqDQUYwGaR18jJmn9JPJmuNDt0QwI0u9JnrdYhDiOmDJIJRKufdmLTGG70hjtGBlR7Mya4v0MvN9fPqLn94vL1fXT3kmpX1mAtbFKW1hoiC07Y1ZV1bkzZJMdNCyVphJqxpNdeEVFYLriVVLWMtl5UoWrZAF9u7Qwwfk4Gx6Rxo473sTZigqcoK0xoTkshiS-oYAKKxzRBdJ-O6IbjZ5Nr8zbXZ5pp0ZzuHSXWm_VF9B5mA8x0gQUtvo-y1g1-Os1RllTi85cI0_NP7C1lbm2A</recordid><startdate>20110801</startdate><enddate>20110801</enddate><creator>Sill, Martin</creator><creator>Kaiser, Sebastian</creator><creator>Benner, Axel</creator><creator>Kopp-Schneider, Annette</creator><general>Oxford University Press</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20110801</creationdate><title>Robust biclustering by sparse singular value decomposition incorporating stability selection</title><author>Sill, Martin ; Kaiser, Sebastian ; Benner, Axel ; Kopp-Schneider, Annette</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c379t-6f557ffe16542dde799fee16a203c6ba9b036fedc4c118fc64ca2bd33d4a865d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Algorithms</topic><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Computational Biology - methods</topic><topic>Computer Simulation</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Humans</topic><topic>Lung Neoplasms - genetics</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Oligonucleotide Array Sequence Analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sill, Martin</creatorcontrib><creatorcontrib>Kaiser, Sebastian</creatorcontrib><creatorcontrib>Benner, Axel</creatorcontrib><creatorcontrib>Kopp-Schneider, Annette</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sill, Martin</au><au>Kaiser, Sebastian</au><au>Benner, Axel</au><au>Kopp-Schneider, Annette</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust biclustering by sparse singular value decomposition incorporating stability selection</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2011-08-01</date><risdate>2011</risdate><volume>27</volume><issue>15</issue><spage>2089</spage><epage>2097</epage><pages>2089-2097</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Motivation: Over the past decade, several biclustering approaches have been published in the field of gene expression data analysis. Despite of huge diversity regarding the mathematical concepts of the different biclustering methods, many of them can be related to the singular value decomposition (SVD). Recently, a sparse SVD approach (SSVD) has been proposed to reveal biclusters in gene expression data. In this article, we propose to incorporate stability selection to improve this method. Stability selection is a subsampling-based variable selection that allows to control Type I error rates. The here proposed S4VD algorithm incorporates this subsampling approach to find stable biclusters, and to estimate the selection probabilities of genes and samples to belong to the biclusters. Results: So far, the S4VD method is the first biclustering approach that takes the cluster stability regarding perturbations of the data into account. Application of the S4VD algorithm to a lung cancer microarray dataset revealed biclusters that correspond to coregulated genes associated with cancer subtypes. Marker genes for different lung cancer subtypes showed high selection probabilities to belong to the corresponding biclusters. Moreover, the genes associated with the biclusters belong to significantly enriched cancer-related Gene Ontology categories. In a simulation study, the S4VD algorithm outperformed the SSVD algorithm and two other SVD-related biclustering methods in recovering artificial biclusters and in being robust to noisy data. Availability: R-Code of the S4VD algorithm as well as a documentation can be found at http://s4vd.r-forge.r-project.org/. Contact: m.sill@dkfz.de Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>21636597</pmid><doi>10.1093/bioinformatics/btr322</doi><tpages>9</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2011-08, Vol.27 (15), p.2089-2097
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_878029011
source Oxford Journals Open Access Collection
subjects Algorithms
Biological and medical sciences
Cluster Analysis
Computational Biology - methods
Computer Simulation
Fundamental and applied biological sciences. Psychology
Gene Expression Profiling - methods
General aspects
Humans
Lung Neoplasms - genetics
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Oligonucleotide Array Sequence Analysis
title Robust biclustering by sparse singular value decomposition incorporating stability selection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T19%3A23%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20biclustering%20by%20sparse%20singular%20value%20decomposition%20incorporating%20stability%20selection&rft.jtitle=Bioinformatics&rft.au=Sill,%20Martin&rft.date=2011-08-01&rft.volume=27&rft.issue=15&rft.spage=2089&rft.epage=2097&rft.pages=2089-2097&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btr322&rft_dat=%3Cproquest_TOX%3E878029011%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=878029011&rft_id=info:pmid/21636597&rft_oup_id=10.1093/bioinformatics/btr322&rfr_iscdi=true