Stochastic convex sparse principal component analysis
Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious dis...
Gespeichert in:
Veröffentlicht in: | EURASIP journal on bioinformatics & systems biology 2016-12, Vol.2016 (1), p.15-11, Article 15 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 11 |
---|---|
container_issue | 1 |
container_start_page | 15 |
container_title | EURASIP journal on bioinformatics & systems biology |
container_volume | 2016 |
creator | Baytas, Inci M. Lin, Kaixiang Wang, Fei Jain, Anil K. Zhou, Jiayu |
description | Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an
ℓ
1
regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort. |
doi_str_mv | 10.1186/s13637-016-0045-x |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5018037</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1859722720</sourcerecordid><originalsourceid>FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</originalsourceid><addsrcrecordid>eNqNkU1rGzEQhkVIiZ2PH5BLMeTSy7YafY10CYTQNgVDD23PQtbKyYb1aiOtg_Pvq8WJcQMFnySkZ955Z15CLoF-BtDqSwauOFYUVEWpkNXmiExBaawESH68uws5Iac5PxZGSYknZMJQKaq4nBL5a4j-weWh8TMfu-ewmeXepRxmfWo63_SuLe-rPnahG2auc-1LbvI5-bB0bQ4Xr-cZ-fPt6-_bu2r-8_uP25t55SVXQ2UMLkFJhQAMVF1LFHSpau1pHRhw46CmiBSFkwbBc4HoqRJa1gsjtAZ-Rq63uv16sQq1Lx6Sa22xtnLpxUbX2H9_uubB3sdnKyloyrEIfHoVSPFpHfJgV032oW1dF-I6W9ClM2PI6AGokGiUMewAlKGhWvNxgqt36GNcp7LGkSoehRB0pGBL-RRzTmG5GxGoHaO226htidqOUdtNqfm4v5tdxVu2BWBbII9R3oe01_q_qn8B3dSybA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1818044401</pqid></control><display><type>article</type><title>Stochastic convex sparse principal component analysis</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>PubMed Central</source><creator>Baytas, Inci M. ; Lin, Kaixiang ; Wang, Fei ; Jain, Anil K. ; Zhou, Jiayu</creator><creatorcontrib>Baytas, Inci M. ; Lin, Kaixiang ; Wang, Fei ; Jain, Anil K. ; Zhou, Jiayu</creatorcontrib><description>Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an
ℓ
1
regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.</description><identifier>ISSN: 1687-4145</identifier><identifier>ISSN: 1687-4153</identifier><identifier>EISSN: 1687-4153</identifier><identifier>DOI: 10.1186/s13637-016-0045-x</identifier><identifier>PMID: 27660635</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Bioinformatics ; Biology ; Biomedical Engineering and Bioengineering ; Biomedical Informatics with Optimization and Machine Learning ; Computation ; Computational Biology/Bioinformatics ; Convergence ; Engineering ; Iterative methods ; Principal component analysis ; Signal,Image and Speech Processing ; Stochasticity ; Systems Biology ; Variance</subject><ispartof>EURASIP journal on bioinformatics & systems biology, 2016-12, Vol.2016 (1), p.15-11, Article 15</ispartof><rights>The Author(s) 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</citedby><cites>FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018037/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,725,778,782,883,27911,27912,41107,42176,51563,53778,53780</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27660635$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Baytas, Inci M.</creatorcontrib><creatorcontrib>Lin, Kaixiang</creatorcontrib><creatorcontrib>Wang, Fei</creatorcontrib><creatorcontrib>Jain, Anil K.</creatorcontrib><creatorcontrib>Zhou, Jiayu</creatorcontrib><title>Stochastic convex sparse principal component analysis</title><title>EURASIP journal on bioinformatics & systems biology</title><addtitle>J Bioinform Sys Biology</addtitle><addtitle>EURASIP J Bioinform Syst Biol</addtitle><description>Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an
ℓ
1
regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.</description><subject>Bioinformatics</subject><subject>Biology</subject><subject>Biomedical Engineering and Bioengineering</subject><subject>Biomedical Informatics with Optimization and Machine Learning</subject><subject>Computation</subject><subject>Computational Biology/Bioinformatics</subject><subject>Convergence</subject><subject>Engineering</subject><subject>Iterative methods</subject><subject>Principal component analysis</subject><subject>Signal,Image and Speech Processing</subject><subject>Stochasticity</subject><subject>Systems Biology</subject><subject>Variance</subject><issn>1687-4145</issn><issn>1687-4153</issn><issn>1687-4153</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNqNkU1rGzEQhkVIiZ2PH5BLMeTSy7YafY10CYTQNgVDD23PQtbKyYb1aiOtg_Pvq8WJcQMFnySkZ955Z15CLoF-BtDqSwauOFYUVEWpkNXmiExBaawESH68uws5Iac5PxZGSYknZMJQKaq4nBL5a4j-weWh8TMfu-ewmeXepRxmfWo63_SuLe-rPnahG2auc-1LbvI5-bB0bQ4Xr-cZ-fPt6-_bu2r-8_uP25t55SVXQ2UMLkFJhQAMVF1LFHSpau1pHRhw46CmiBSFkwbBc4HoqRJa1gsjtAZ-Rq63uv16sQq1Lx6Sa22xtnLpxUbX2H9_uubB3sdnKyloyrEIfHoVSPFpHfJgV032oW1dF-I6W9ClM2PI6AGokGiUMewAlKGhWvNxgqt36GNcp7LGkSoehRB0pGBL-RRzTmG5GxGoHaO226htidqOUdtNqfm4v5tdxVu2BWBbII9R3oe01_q_qn8B3dSybA</recordid><startdate>20161201</startdate><enddate>20161201</enddate><creator>Baytas, Inci M.</creator><creator>Lin, Kaixiang</creator><creator>Wang, Fei</creator><creator>Jain, Anil K.</creator><creator>Zhou, Jiayu</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>7XB</scope><scope>8AL</scope><scope>8BQ</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>F28</scope><scope>FR3</scope><scope>GNUQQ</scope><scope>H8D</scope><scope>H8G</scope><scope>HCIFZ</scope><scope>JG9</scope><scope>JQ2</scope><scope>K7-</scope><scope>KR7</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20161201</creationdate><title>Stochastic convex sparse principal component analysis</title><author>Baytas, Inci M. ; Lin, Kaixiang ; Wang, Fei ; Jain, Anil K. ; Zhou, Jiayu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c536t-997f1656711216dd5740f6d8c0de2139a1d077074a5971c3477c06485db948813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Bioinformatics</topic><topic>Biology</topic><topic>Biomedical Engineering and Bioengineering</topic><topic>Biomedical Informatics with Optimization and Machine Learning</topic><topic>Computation</topic><topic>Computational Biology/Bioinformatics</topic><topic>Convergence</topic><topic>Engineering</topic><topic>Iterative methods</topic><topic>Principal component analysis</topic><topic>Signal,Image and Speech Processing</topic><topic>Stochasticity</topic><topic>Systems Biology</topic><topic>Variance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Baytas, Inci M.</creatorcontrib><creatorcontrib>Lin, Kaixiang</creatorcontrib><creatorcontrib>Wang, Fei</creatorcontrib><creatorcontrib>Jain, Anil K.</creatorcontrib><creatorcontrib>Zhou, Jiayu</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>Natural Science Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Central Student</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>SciTech Premium Collection</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>EURASIP journal on bioinformatics & systems biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Baytas, Inci M.</au><au>Lin, Kaixiang</au><au>Wang, Fei</au><au>Jain, Anil K.</au><au>Zhou, Jiayu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Stochastic convex sparse principal component analysis</atitle><jtitle>EURASIP journal on bioinformatics & systems biology</jtitle><stitle>J Bioinform Sys Biology</stitle><addtitle>EURASIP J Bioinform Syst Biol</addtitle><date>2016-12-01</date><risdate>2016</risdate><volume>2016</volume><issue>1</issue><spage>15</spage><epage>11</epage><pages>15-11</pages><artnum>15</artnum><issn>1687-4145</issn><issn>1687-4153</issn><eissn>1687-4153</eissn><abstract>Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an
ℓ
1
regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>27660635</pmid><doi>10.1186/s13637-016-0045-x</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1687-4145 |
ispartof | EURASIP journal on bioinformatics & systems biology, 2016-12, Vol.2016 (1), p.15-11, Article 15 |
issn | 1687-4145 1687-4153 1687-4153 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5018037 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; Springer Nature OA Free Journals; PubMed Central |
subjects | Bioinformatics Biology Biomedical Engineering and Bioengineering Biomedical Informatics with Optimization and Machine Learning Computation Computational Biology/Bioinformatics Convergence Engineering Iterative methods Principal component analysis Signal,Image and Speech Processing Stochasticity Systems Biology Variance |
title | Stochastic convex sparse principal component analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T04%3A03%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Stochastic%20convex%20sparse%20principal%20component%20analysis&rft.jtitle=EURASIP%20journal%20on%20bioinformatics%20&%20systems%20biology&rft.au=Baytas,%20Inci%20M.&rft.date=2016-12-01&rft.volume=2016&rft.issue=1&rft.spage=15&rft.epage=11&rft.pages=15-11&rft.artnum=15&rft.issn=1687-4145&rft.eissn=1687-4153&rft_id=info:doi/10.1186/s13637-016-0045-x&rft_dat=%3Cproquest_pubme%3E1859722720%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1818044401&rft_id=info:pmid/27660635&rfr_iscdi=true |