A Weighted Principal Component Analysis and Its Application to Gene Expression Data

In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2011-01, Vol.8 (1), p.246-252
Hauptverfasser: Pinto da Costa, Joaquim F, Alonso, H, Roque, L
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 252
container_issue 1
container_start_page 246
container_title IEEE/ACM transactions on computational biology and bioinformatics
container_volume 8
creator Pinto da Costa, Joaquim F
Alonso, H
Roque, L
description In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.
doi_str_mv 10.1109/TCBB.2009.61
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_849480515</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5184803</ieee_id><sourcerecordid>849480515</sourcerecordid><originalsourceid>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</originalsourceid><addsrcrecordid>eNqF0UtLxDAQB_Agiu-bN0ECHrzYNZNHmxx319fCgoKKx5ImsxrptrXpgn57W1Y9ePGUIfNjGOZPyBGwEQAzF4_TyWTEGTOjFDbILiiVJcakcnOopUqUScUO2YvxjTEuDZPbZIcDy0AD3yUPY_qM4eW1Q0_v21C50NiSTutlU1dYdXRc2fIzhkht5emsi3TcNGVwtgt1Rbua3mCF9OqjaTHG4evSdvaAbC1sGfHw-90nT9dXj9PbZH53M5uO54kTGeuSTHmeCkSleaaE91KDTvWCMce5MUKjKwonF8oX6AvAzGQytVAY7gGV84XYJ2fruU1bv68wdvkyRIdlaSusVzHX0kjNFKh_ZZZKJblUopenf-RbvWr7I8QcmGD9dUGyXp2vlWvrGFtc5E0blrb97FE-pJIPqeRDKnkKPT_5Hroqluh_8U8MPTheg4CIv20Fut9fiC_NCo6O</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1030155140</pqid></control><display><type>article</type><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><source>IEEE Electronic Library (IEL)</source><creator>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</creator><creatorcontrib>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</creatorcontrib><description>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2009.61</identifier><identifier>PMID: 21071812</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithm design and analysis ; Algorithms ; Artificial Intelligence ; Bioinformatics ; Computational Biology - methods ; Correlation ; Data analysis ; Data Mining ; Databases, Genetic ; Gene expression ; Gene Expression Profiling - methods ; gene selection ; Humans ; Iterative algorithms ; Metabolomics ; microarray data ; Noise level ; Noise robustness ; Oligonucleotide Array Sequence Analysis ; Principal component analysis ; Principal Component Analysis - methods ; Studies ; Support vector machines</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2011-01, Vol.8 (1), p.246-252</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jan/Mar 2011</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</citedby><cites>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5184803$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5184803$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21071812$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Pinto da Costa, Joaquim F</creatorcontrib><creatorcontrib>Alonso, H</creatorcontrib><creatorcontrib>Roque, L</creatorcontrib><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</description><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Computational Biology - methods</subject><subject>Correlation</subject><subject>Data analysis</subject><subject>Data Mining</subject><subject>Databases, Genetic</subject><subject>Gene expression</subject><subject>Gene Expression Profiling - methods</subject><subject>gene selection</subject><subject>Humans</subject><subject>Iterative algorithms</subject><subject>Metabolomics</subject><subject>microarray data</subject><subject>Noise level</subject><subject>Noise robustness</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Principal component analysis</subject><subject>Principal Component Analysis - methods</subject><subject>Studies</subject><subject>Support vector machines</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNqF0UtLxDAQB_Agiu-bN0ECHrzYNZNHmxx319fCgoKKx5ImsxrptrXpgn57W1Y9ePGUIfNjGOZPyBGwEQAzF4_TyWTEGTOjFDbILiiVJcakcnOopUqUScUO2YvxjTEuDZPbZIcDy0AD3yUPY_qM4eW1Q0_v21C50NiSTutlU1dYdXRc2fIzhkht5emsi3TcNGVwtgt1Rbua3mCF9OqjaTHG4evSdvaAbC1sGfHw-90nT9dXj9PbZH53M5uO54kTGeuSTHmeCkSleaaE91KDTvWCMce5MUKjKwonF8oX6AvAzGQytVAY7gGV84XYJ2fruU1bv68wdvkyRIdlaSusVzHX0kjNFKh_ZZZKJblUopenf-RbvWr7I8QcmGD9dUGyXp2vlWvrGFtc5E0blrb97FE-pJIPqeRDKnkKPT_5Hroqluh_8U8MPTheg4CIv20Fut9fiC_NCo6O</recordid><startdate>201101</startdate><enddate>201101</enddate><creator>Pinto da Costa, Joaquim F</creator><creator>Alonso, H</creator><creator>Roque, L</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>RC3</scope></search><sort><creationdate>201101</creationdate><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><author>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Computational Biology - methods</topic><topic>Correlation</topic><topic>Data analysis</topic><topic>Data Mining</topic><topic>Databases, Genetic</topic><topic>Gene expression</topic><topic>Gene Expression Profiling - methods</topic><topic>gene selection</topic><topic>Humans</topic><topic>Iterative algorithms</topic><topic>Metabolomics</topic><topic>microarray data</topic><topic>Noise level</topic><topic>Noise robustness</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Principal component analysis</topic><topic>Principal Component Analysis - methods</topic><topic>Studies</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pinto da Costa, Joaquim F</creatorcontrib><creatorcontrib>Alonso, H</creatorcontrib><creatorcontrib>Roque, L</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Genetics Abstracts</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pinto da Costa, Joaquim F</au><au>Alonso, H</au><au>Roque, L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2011-01</date><risdate>2011</risdate><volume>8</volume><issue>1</issue><spage>246</spage><epage>252</epage><pages>246-252</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>21071812</pmid><doi>10.1109/TCBB.2009.61</doi><tpages>7</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-5963
ispartof IEEE/ACM transactions on computational biology and bioinformatics, 2011-01, Vol.8 (1), p.246-252
issn 1545-5963
1557-9964
language eng
recordid cdi_proquest_miscellaneous_849480515
source IEEE Electronic Library (IEL)
subjects Algorithm design and analysis
Algorithms
Artificial Intelligence
Bioinformatics
Computational Biology - methods
Correlation
Data analysis
Data Mining
Databases, Genetic
Gene expression
Gene Expression Profiling - methods
gene selection
Humans
Iterative algorithms
Metabolomics
microarray data
Noise level
Noise robustness
Oligonucleotide Array Sequence Analysis
Principal component analysis
Principal Component Analysis - methods
Studies
Support vector machines
title A Weighted Principal Component Analysis and Its Application to Gene Expression Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T14%3A06%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Weighted%20Principal%20Component%20Analysis%20and%20Its%20Application%20to%20Gene%20Expression%20Data&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Pinto%20da%20Costa,%20Joaquim%20F&rft.date=2011-01&rft.volume=8&rft.issue=1&rft.spage=246&rft.epage=252&rft.pages=246-252&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2009.61&rft_dat=%3Cproquest_RIE%3E849480515%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1030155140&rft_id=info:pmid/21071812&rft_ieee_id=5184803&rfr_iscdi=true