A Weighted Principal Component Analysis and Its Application to Gene Expression Data

In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on computational biology and bioinformatics 2011-01, Vol.8 (1), p.246-252
Hauptverfasser:	Pinto da Costa, Joaquim F, Alonso, H, Roque, L
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithm design and analysis Algorithms Artificial Intelligence Bioinformatics Computational Biology - methods Correlation Data analysis Data Mining Databases, Genetic Gene expression Gene Expression Profiling - methods gene selection Humans Iterative algorithms Metabolomics microarray data Noise level Noise robustness Oligonucleotide Array Sequence Analysis Principal component analysis Principal Component Analysis - methods Studies Support vector machines
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	252
container_issue	1
container_start_page	246
container_title	IEEE/ACM transactions on computational biology and bioinformatics
container_volume	8
creator	Pinto da Costa, Joaquim F Alonso, H Roque, L
description	In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.
doi_str_mv	10.1109/TCBB.2009.61
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_849480515</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5184803</ieee_id><sourcerecordid>849480515</sourcerecordid><originalsourceid>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</originalsourceid><addsrcrecordid>eNqF0UtLxDAQB_Agiu-bN0ECHrzYNZNHmxx319fCgoKKx5ImsxrptrXpgn57W1Y9ePGUIfNjGOZPyBGwEQAzF4_TyWTEGTOjFDbILiiVJcakcnOopUqUScUO2YvxjTEuDZPbZIcDy0AD3yUPY_qM4eW1Q0_v21C50NiSTutlU1dYdXRc2fIzhkht5emsi3TcNGVwtgt1Rbua3mCF9OqjaTHG4evSdvaAbC1sGfHw-90nT9dXj9PbZH53M5uO54kTGeuSTHmeCkSleaaE91KDTvWCMce5MUKjKwonF8oX6AvAzGQytVAY7gGV84XYJ2fruU1bv68wdvkyRIdlaSusVzHX0kjNFKh_ZZZKJblUopenf-RbvWr7I8QcmGD9dUGyXp2vlWvrGFtc5E0blrb97FE-pJIPqeRDKnkKPT_5Hroqluh_8U8MPTheg4CIv20Fut9fiC_NCo6O</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1030155140</pqid></control><display><type>article</type><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><source>IEEE Electronic Library (IEL)</source><creator>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</creator><creatorcontrib>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</creatorcontrib><description>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2009.61</identifier><identifier>PMID: 21071812</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithm design and analysis ; Algorithms ; Artificial Intelligence ; Bioinformatics ; Computational Biology - methods ; Correlation ; Data analysis ; Data Mining ; Databases, Genetic ; Gene expression ; Gene Expression Profiling - methods ; gene selection ; Humans ; Iterative algorithms ; Metabolomics ; microarray data ; Noise level ; Noise robustness ; Oligonucleotide Array Sequence Analysis ; Principal component analysis ; Principal Component Analysis - methods ; Studies ; Support vector machines</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2011-01, Vol.8 (1), p.246-252</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jan/Mar 2011</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</citedby><cites>FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5184803$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5184803$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21071812$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Pinto da Costa, Joaquim F</creatorcontrib><creatorcontrib>Alonso, H</creatorcontrib><creatorcontrib>Roque, L</creatorcontrib><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</description><subject>Algorithm design and analysis</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Computational Biology - methods</subject><subject>Correlation</subject><subject>Data analysis</subject><subject>Data Mining</subject><subject>Databases, Genetic</subject><subject>Gene expression</subject><subject>Gene Expression Profiling - methods</subject><subject>gene selection</subject><subject>Humans</subject><subject>Iterative algorithms</subject><subject>Metabolomics</subject><subject>microarray data</subject><subject>Noise level</subject><subject>Noise robustness</subject><subject>Oligonucleotide Array Sequence Analysis</subject><subject>Principal component analysis</subject><subject>Principal Component Analysis - methods</subject><subject>Studies</subject><subject>Support vector machines</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNqF0UtLxDAQB_Agiu-bN0ECHrzYNZNHmxx319fCgoKKx5ImsxrptrXpgn57W1Y9ePGUIfNjGOZPyBGwEQAzF4_TyWTEGTOjFDbILiiVJcakcnOopUqUScUO2YvxjTEuDZPbZIcDy0AD3yUPY_qM4eW1Q0_v21C50NiSTutlU1dYdXRc2fIzhkht5emsi3TcNGVwtgt1Rbua3mCF9OqjaTHG4evSdvaAbC1sGfHw-90nT9dXj9PbZH53M5uO54kTGeuSTHmeCkSleaaE91KDTvWCMce5MUKjKwonF8oX6AvAzGQytVAY7gGV84XYJ2fruU1bv68wdvkyRIdlaSusVzHX0kjNFKh_ZZZKJblUopenf-RbvWr7I8QcmGD9dUGyXp2vlWvrGFtc5E0blrb97FE-pJIPqeRDKnkKPT_5Hroqluh_8U8MPTheg4CIv20Fut9fiC_NCo6O</recordid><startdate>201101</startdate><enddate>201101</enddate><creator>Pinto da Costa, Joaquim F</creator><creator>Alonso, H</creator><creator>Roque, L</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>RC3</scope></search><sort><creationdate>201101</creationdate><title>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</title><author>Pinto da Costa, Joaquim F ; Alonso, H ; Roque, L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c370t-75d263ee582753dd481868f00c229938ecbbc4f5dbedb1e79746a1b92d1e5cdb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Algorithm design and analysis</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Computational Biology - methods</topic><topic>Correlation</topic><topic>Data analysis</topic><topic>Data Mining</topic><topic>Databases, Genetic</topic><topic>Gene expression</topic><topic>Gene Expression Profiling - methods</topic><topic>gene selection</topic><topic>Humans</topic><topic>Iterative algorithms</topic><topic>Metabolomics</topic><topic>microarray data</topic><topic>Noise level</topic><topic>Noise robustness</topic><topic>Oligonucleotide Array Sequence Analysis</topic><topic>Principal component analysis</topic><topic>Principal Component Analysis - methods</topic><topic>Studies</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pinto da Costa, Joaquim F</creatorcontrib><creatorcontrib>Alonso, H</creatorcontrib><creatorcontrib>Roque, L</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Genetics Abstracts</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pinto da Costa, Joaquim F</au><au>Alonso, H</au><au>Roque, L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Weighted Principal Component Analysis and Its Application to Gene Expression Data</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2011-01</date><risdate>2011</risdate><volume>8</volume><issue>1</issue><spage>246</spage><epage>252</epage><pages>246-252</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>21071812</pmid><doi>10.1109/TCBB.2009.61</doi><tpages>7</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1545-5963
ispartof	IEEE/ACM transactions on computational biology and bioinformatics, 2011-01, Vol.8 (1), p.246-252
issn	1545-5963 1557-9964
language	eng
recordid	cdi_proquest_miscellaneous_849480515
source	IEEE Electronic Library (IEL)
subjects	Algorithm design and analysis Algorithms Artificial Intelligence Bioinformatics Computational Biology - methods Correlation Data analysis Data Mining Databases, Genetic Gene expression Gene Expression Profiling - methods gene selection Humans Iterative algorithms Metabolomics microarray data Noise level Noise robustness Oligonucleotide Array Sequence Analysis Principal component analysis Principal Component Analysis - methods Studies Support vector machines
title	A Weighted Principal Component Analysis and Its Application to Gene Expression Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T14%3A06%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Weighted%20Principal%20Component%20Analysis%20and%20Its%20Application%20to%20Gene%20Expression%20Data&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Pinto%20da%20Costa,%20Joaquim%20F&rft.date=2011-01&rft.volume=8&rft.issue=1&rft.spage=246&rft.epage=252&rft.pages=246-252&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2009.61&rft_dat=%3Cproquest_RIE%3E849480515%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1030155140&rft_id=info:pmid/21071812&rft_ieee_id=5184803&rfr_iscdi=true