A Weighted Principal Component Analysis and Its Application to Gene Expression Data
In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the d...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on computational biology and bioinformatics 2011-01, Vol.8 (1), p.246-252 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 252 |
---|---|
container_issue | 1 |
container_start_page | 246 |
container_title | IEEE/ACM transactions on computational biology and bioinformatics |
container_volume | 8 |
creator | Pinto da Costa, Joaquim F Alonso, H Roque, L |
description | In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm. |
doi_str_mv | 10.1109/TCBB.2009.61 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_849480515</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5184803</ieee_id><sourcerecordid>849480515</sourcerecordid><originalsourceid>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</originalsourceid><addsrcrecordid>eNqF0UtLxDAQB_Agiu-bN0ECHrzYNZNHmxx319fCgoKKx5ImsxrptrXpgn57W1Y9ePGUIfNjGOZPyBGwEQAzF4_TyWTEGTOjFDbILiiVJcakcnOopUqUScUO2YvxjTEuDZPbZIcDy0AD3yUPY_qM4eW1Q0_v21C50NiSTutlU1dYdXRc2fIzhkht5emsi3TcNGVwtgt1Rbua3mCF9OqjaTHG4evSdvaAbC1sGfHw-90nT9dXj9PbZH53M5uO54kTGeuSTHmeCkSleaaE91KDTvWCMce5MUKjKwonF8oX6AvAzGQytVAY7gGV84XYJ2fruU1bv68wdvkyRIdlaSusVzHX0kjNFKh_ZZZKJblUopenf-RbvWr7I8QcmGD9dUGyXp2vlWvrGFtc5E0blrb97FE-pJIPqeRDKnkKPT_5Hroqluh_8U8MPTheg4CIv20Fut9fiC_NCo6O</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1030155140</pqid></control><display><type>article</type><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><source>IEEE Electronic Library (IEL)</source><creator>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</creator><creatorcontrib>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</creatorcontrib><description>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2009.61</identifier><identifier>PMID: 21071812</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithm design and analysis ; Algorithms ; Artificial Intelligence ; Bioinformatics ; Computational Biology - methods ; Correlation ; Data analysis ; Data Mining ; Databases, Genetic ; Gene expression ; Gene Expression Profiling - methods ; gene selection ; Humans ; Iterative algorithms ; Metabolomics ; microarray data ; Noise level ; Noise robustness ; Oligonucleotide Array Sequence Analysis ; Principal component analysis ; Principal Component Analysis - methods ; Studies ; Support vector machines</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2011-01, Vol.8 (1), p.246-252</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jan/Mar 2011</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</citedby><cites>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5184803$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5184803$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21071812$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Pinto da Costa, Joaquim F</creatorcontrib><creatorcontrib>Alonso, H</creatorcontrib><creatorcontrib>Roque, L</creatorcontrib><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</description><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Computational Biology - methods</subject><subject>Correlation</subject><subject>Data analysis</subject><subject>Data Mining</subject><subject>Databases, Genetic</subject><subject>Gene expression</subject><subject>Gene Expression Profiling - methods</subject><subject>gene selection</subject><subject>Humans</subject><subject>Iterative algorithms</subject><subject>Metabolomics</subject><subject>microarray data</subject><subject>Noise level</subject><subject>Noise robustness</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Principal component analysis</subject><subject>Principal Component Analysis - methods</subject><subject>Studies</subject><subject>Support vector machines</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNqF0UtLxDAQB_Agiu-bN0ECHrzYNZNHmxx319fCgoKKx5ImsxrptrXpgn57W1Y9ePGUIfNjGOZPyBGwEQAzF4_TyWTEGTOjFDbILiiVJcakcnOopUqUScUO2YvxjTEuDZPbZIcDy0AD3yUPY_qM4eW1Q0_v21C50NiSTutlU1dYdXRc2fIzhkht5emsi3TcNGVwtgt1Rbua3mCF9OqjaTHG4evSdvaAbC1sGfHw-90nT9dXj9PbZH53M5uO54kTGeuSTHmeCkSleaaE91KDTvWCMce5MUKjKwonF8oX6AvAzGQytVAY7gGV84XYJ2fruU1bv68wdvkyRIdlaSusVzHX0kjNFKh_ZZZKJblUopenf-RbvWr7I8QcmGD9dUGyXp2vlWvrGFtc5E0blrb97FE-pJIPqeRDKnkKPT_5Hroqluh_8U8MPTheg4CIv20Fut9fiC_NCo6O</recordid><startdate>201101</startdate><enddate>201101</enddate><creator>Pinto da Costa, Joaquim F</creator><creator>Alonso, H</creator><creator>Roque, L</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>RC3</scope></search><sort><creationdate>201101</creationdate><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><author>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Computational Biology - methods</topic><topic>Correlation</topic><topic>Data analysis</topic><topic>Data Mining</topic><topic>Databases, Genetic</topic><topic>Gene expression</topic><topic>Gene Expression Profiling - methods</topic><topic>gene selection</topic><topic>Humans</topic><topic>Iterative algorithms</topic><topic>Metabolomics</topic><topic>microarray data</topic><topic>Noise level</topic><topic>Noise robustness</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Principal component analysis</topic><topic>Principal Component Analysis - methods</topic><topic>Studies</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pinto da Costa, Joaquim F</creatorcontrib><creatorcontrib>Alonso, H</creatorcontrib><creatorcontrib>Roque, L</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Genetics Abstracts</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pinto da Costa, Joaquim F</au><au>Alonso, H</au><au>Roque, L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2011-01</date><risdate>2011</risdate><volume>8</volume><issue>1</issue><spage>246</spage><epage>252</epage><pages>246-252</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>21071812</pmid><doi>10.1109/TCBB.2009.61</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1545-5963 |
ispartof | IEEE/ACM transactions on computational biology and bioinformatics, 2011-01, Vol.8 (1), p.246-252 |
issn | 1545-5963 1557-9964 |
language | eng |
recordid | cdi_proquest_miscellaneous_849480515 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithm design and analysis Algorithms Artificial Intelligence Bioinformatics Computational Biology - methods Correlation Data analysis Data Mining Databases, Genetic Gene expression Gene Expression Profiling - methods gene selection Humans Iterative algorithms Metabolomics microarray data Noise level Noise robustness Oligonucleotide Array Sequence Analysis Principal component analysis Principal Component Analysis - methods Studies Support vector machines |
title | A Weighted Principal Component Analysis and Its Application to Gene Expression Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T14%3A06%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Weighted%20Principal%20Component%20Analysis%20and%20Its%20Application%20to%20Gene%20Expression%20Data&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Pinto%20da%20Costa,%20Joaquim%20F&rft.date=2011-01&rft.volume=8&rft.issue=1&rft.spage=246&rft.epage=252&rft.pages=246-252&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2009.61&rft_dat=%3Cproquest_RIE%3E849480515%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1030155140&rft_id=info:pmid/21071812&rft_ieee_id=5184803&rfr_iscdi=true |