Stochastic convex sparse principal component analysis

Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious dis...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:EURASIP journal on bioinformatics & systems biology 2016-12, Vol.2016 (1), p.15-11, Article 15
Hauptverfasser: Baytas, Inci M., Lin, Kaixiang, Wang, Fei, Jain, Anil K., Zhou, Jiayu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 11
container_issue 1
container_start_page 15
container_title EURASIP journal on bioinformatics & systems biology
container_volume 2016
creator Baytas, Inci M.
Lin, Kaixiang
Wang, Fei
Jain, Anil K.
Zhou, Jiayu
description Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ 1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.
doi_str_mv 10.1186/s13637-016-0045-x
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5018037</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1859722720</sourcerecordid><originalsourceid>FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</originalsourceid><addsrcrecordid>eNqNkU1rGzEQhkVIiZ2PH5BLMeTSy7YafY10CYTQNgVDD23PQtbKyYb1aiOtg_Pvq8WJcQMFnySkZ955Z15CLoF-BtDqSwauOFYUVEWpkNXmiExBaawESH68uws5Iac5PxZGSYknZMJQKaq4nBL5a4j-weWh8TMfu-ewmeXepRxmfWo63_SuLe-rPnahG2auc-1LbvI5-bB0bQ4Xr-cZ-fPt6-_bu2r-8_uP25t55SVXQ2UMLkFJhQAMVF1LFHSpau1pHRhw46CmiBSFkwbBc4HoqRJa1gsjtAZ-Rq63uv16sQq1Lx6Sa22xtnLpxUbX2H9_uubB3sdnKyloyrEIfHoVSPFpHfJgV032oW1dF-I6W9ClM2PI6AGokGiUMewAlKGhWvNxgqt36GNcp7LGkSoehRB0pGBL-RRzTmG5GxGoHaO226htidqOUdtNqfm4v5tdxVu2BWBbII9R3oe01_q_qn8B3dSybA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1818044401</pqid></control><display><type>article</type><title>Stochastic convex sparse principal component analysis</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>PubMed Central</source><creator>Baytas, Inci M. ; Lin, Kaixiang ; Wang, Fei ; Jain, Anil K. ; Zhou, Jiayu</creator><creatorcontrib>Baytas, Inci M. ; Lin, Kaixiang ; Wang, Fei ; Jain, Anil K. ; Zhou, Jiayu</creatorcontrib><description>Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ 1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.</description><identifier>ISSN: 1687-4145</identifier><identifier>ISSN: 1687-4153</identifier><identifier>EISSN: 1687-4153</identifier><identifier>DOI: 10.1186/s13637-016-0045-x</identifier><identifier>PMID: 27660635</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Bioinformatics ; Biology ; Biomedical Engineering and Bioengineering ; Biomedical Informatics with Optimization and Machine Learning ; Computation ; Computational Biology/Bioinformatics ; Convergence ; Engineering ; Iterative methods ; Principal component analysis ; Signal,Image and Speech Processing ; Stochasticity ; Systems Biology ; Variance</subject><ispartof>EURASIP journal on bioinformatics &amp; systems biology, 2016-12, Vol.2016 (1), p.15-11, Article 15</ispartof><rights>The Author(s) 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</citedby><cites>FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,725,778,782,883,27911,27912,41107,42176,51563,53778,53780</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27660635$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Baytas, Inci M.</creatorcontrib><creatorcontrib>Lin, Kaixiang</creatorcontrib><creatorcontrib>Wang, Fei</creatorcontrib><creatorcontrib>Jain, Anil K.</creatorcontrib><creatorcontrib>Zhou, Jiayu</creatorcontrib><title>Stochastic convex sparse principal component analysis</title><title>EURASIP journal on bioinformatics &amp; systems biology</title><addtitle>J Bioinform Sys Biology</addtitle><addtitle>EURASIP J Bioinform Syst Biol</addtitle><description>Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ 1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.</description><subject>Bioinformatics</subject><subject>Biology</subject><subject>Biomedical Engineering and Bioengineering</subject><subject>Biomedical Informatics with Optimization and Machine Learning</subject><subject>Computation</subject><subject>Computational Biology/Bioinformatics</subject><subject>Convergence</subject><subject>Engineering</subject><subject>Iterative methods</subject><subject>Principal component analysis</subject><subject>Signal,Image and Speech Processing</subject><subject>Stochasticity</subject><subject>Systems Biology</subject><subject>Variance</subject><issn>1687-4145</issn><issn>1687-4153</issn><issn>1687-4153</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNqNkU1rGzEQhkVIiZ2PH5BLMeTSy7YafY10CYTQNgVDD23PQtbKyYb1aiOtg_Pvq8WJcQMFnySkZ955Z15CLoF-BtDqSwauOFYUVEWpkNXmiExBaawESH68uws5Iac5PxZGSYknZMJQKaq4nBL5a4j-weWh8TMfu-ewmeXepRxmfWo63_SuLe-rPnahG2auc-1LbvI5-bB0bQ4Xr-cZ-fPt6-_bu2r-8_uP25t55SVXQ2UMLkFJhQAMVF1LFHSpau1pHRhw46CmiBSFkwbBc4HoqRJa1gsjtAZ-Rq63uv16sQq1Lx6Sa22xtnLpxUbX2H9_uubB3sdnKyloyrEIfHoVSPFpHfJgV032oW1dF-I6W9ClM2PI6AGokGiUMewAlKGhWvNxgqt36GNcp7LGkSoehRB0pGBL-RRzTmG5GxGoHaO226htidqOUdtNqfm4v5tdxVu2BWBbII9R3oe01_q_qn8B3dSybA</recordid><startdate>20161201</startdate><enddate>20161201</enddate><creator>Baytas, Inci M.</creator><creator>Lin, Kaixiang</creator><creator>Wang, Fei</creator><creator>Jain, Anil K.</creator><creator>Zhou, Jiayu</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>7XB</scope><scope>8AL</scope><scope>8BQ</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>F28</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>H8D</scope><scope>H8G</scope><scope>HCIFZ</scope><scope>JG9</scope><scope>JQ2</scope><scope>K7-</scope><scope>KR7</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20161201</creationdate><title>Stochastic convex sparse principal component analysis</title><author>Baytas, Inci M. ; Lin, Kaixiang ; Wang, Fei ; Jain, Anil K. ; Zhou, Jiayu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Bioinformatics</topic><topic>Biology</topic><topic>Biomedical Engineering and Bioengineering</topic><topic>Biomedical Informatics with Optimization and Machine Learning</topic><topic>Computation</topic><topic>Computational Biology/Bioinformatics</topic><topic>Convergence</topic><topic>Engineering</topic><topic>Iterative methods</topic><topic>Principal component analysis</topic><topic>Signal,Image and Speech Processing</topic><topic>Stochasticity</topic><topic>Systems Biology</topic><topic>Variance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Baytas, Inci M.</creatorcontrib><creatorcontrib>Lin, Kaixiang</creatorcontrib><creatorcontrib>Wang, Fei</creatorcontrib><creatorcontrib>Jain, Anil K.</creatorcontrib><creatorcontrib>Zhou, Jiayu</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>Natural Science Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>SciTech Premium Collection</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>EURASIP journal on bioinformatics &amp; systems biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Baytas, Inci M.</au><au>Lin, Kaixiang</au><au>Wang, Fei</au><au>Jain, Anil K.</au><au>Zhou, Jiayu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Stochastic convex sparse principal component analysis</atitle><jtitle>EURASIP journal on bioinformatics &amp; systems biology</jtitle><stitle>J Bioinform Sys Biology</stitle><addtitle>EURASIP J Bioinform Syst Biol</addtitle><date>2016-12-01</date><risdate>2016</risdate><volume>2016</volume><issue>1</issue><spage>15</spage><epage>11</epage><pages>15-11</pages><artnum>15</artnum><issn>1687-4145</issn><issn>1687-4153</issn><eissn>1687-4153</eissn><abstract>Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ 1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>27660635</pmid><doi>10.1186/s13637-016-0045-x</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1687-4145
ispartof EURASIP journal on bioinformatics & systems biology, 2016-12, Vol.2016 (1), p.15-11, Article 15
issn 1687-4145
1687-4153
1687-4153
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5018037
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; Springer Nature OA Free Journals; PubMed Central
subjects Bioinformatics
Biology
Biomedical Engineering and Bioengineering
Biomedical Informatics with Optimization and Machine Learning
Computation
Computational Biology/Bioinformatics
Convergence
Engineering
Iterative methods
Principal component analysis
Signal,Image and Speech Processing
Stochasticity
Systems Biology
Variance
title Stochastic convex sparse principal component analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T04%3A03%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Stochastic%20convex%20sparse%20principal%20component%20analysis&rft.jtitle=EURASIP%20journal%20on%20bioinformatics%20&%20systems%20biology&rft.au=Baytas,%20Inci%20M.&rft.date=2016-12-01&rft.volume=2016&rft.issue=1&rft.spage=15&rft.epage=11&rft.pages=15-11&rft.artnum=15&rft.issn=1687-4145&rft.eissn=1687-4153&rft_id=info:doi/10.1186/s13637-016-0045-x&rft_dat=%3Cproquest_pubme%3E1859722720%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1818044401&rft_id=info:pmid/27660635&rfr_iscdi=true