Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods

This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biometrics 2003-12, Vol.59 (4), p.1131-1139
Hauptverfasser: Wouters, Luc, Göhlmann, Hinrich W., Bijnens, Luc, Kass, Stefan U., Molenberghs, Geert, Lewi, Paul J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1139
container_issue 4
container_start_page 1131
container_title Biometrics
container_volume 59
creator Wouters, Luc
Göhlmann, Hinrich W.
Bijnens, Luc
Kass, Stefan U.
Molenberghs, Geert
Lewi, Paul J.
description This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the resulting principal factors are not very informative, while correspondence factor analysis (CFA) has difficulties interpreting distances between objects. Spectral map analysis (SMA) is introduced as an alternative approach to the analysis of microarray data. Weighted SMA outperforms PCA, and is at least as powerful as CFA, in finding clusters in the samples, as well as identifying genes related to these clusters. SMA addresses the problem of data analysis in microarray experiments in a more appropriate manner than CFA, and allows more flexible weighting to the genes and samples. Proper weighting is important, since it enables less reliable data to be down-weighted and more reliable information to be emphasized.
doi_str_mv 10.1111/j.0006-341X.2003.00130.x
format Article
fullrecord <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_71582040</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>3695355</jstor_id><sourcerecordid>3695355</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5290-c953447efa0f54ca4c7b6f384382f688efbb7abe95cea7ea587b2e5f773ab4dc3</originalsourceid><addsrcrecordid>eNqNkM9v0zAUxy0EYmXsP0AoJ24pdmzHCRKHrYwyaWWH_YCbeXGf1ZS0zuwE2v8eh1Tddb7Y773P91n6EJIwOmXxfFxPKaV5ygX7Oc0o5bFknE53L8iEScFSKjL6kkyO0Al5E8I6lqWk2WtywkSZl6IUE_Jr7qFd1Qaa5HLXNs5DV7tt4mwyxy0OPY8hDK0v0MGn5DyZuU0LA_YHk9uuX-4H-G7lEZNF38Q2-Bq6WGC3csvwlryy0AQ8O9yn5P7r5d3sW3p9M7-anV-nRmYlTU0puRAKLVArhQFhVJVbXgheZDYvCrRVpaDCUhoEhSALVWUorVIcKrE0_JR8GPe23j32GDq9qYPBpoEtuj5oxWSRUUEjWIyg8S4Ej1a3vt6A32tG9WBXr_UgTg_i9GBX_7erdzH6_vBHX21w-RQ86IzA5xH4Wze4f_ZifXF1s4ivmH835tehc_6Y53m0I2Ucp-O4Dh3ujmPwv3WuuJL6x_e5vlXigV08cL3g_wApxaJZ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>71582040</pqid></control><display><type>article</type><title>Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods</title><source>MEDLINE</source><source>JSTOR Mathematics &amp; Statistics</source><source>Access via Wiley Online Library</source><source>JSTOR Archive Collection A-Z Listing</source><source>Oxford University Press Journals All Titles (1996-Current)</source><creator>Wouters, Luc ; Göhlmann, Hinrich W. ; Bijnens, Luc ; Kass, Stefan U. ; Molenberghs, Geert ; Lewi, Paul J.</creator><creatorcontrib>Wouters, Luc ; Göhlmann, Hinrich W. ; Bijnens, Luc ; Kass, Stefan U. ; Molenberghs, Geert ; Lewi, Paul J.</creatorcontrib><description>This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the resulting principal factors are not very informative, while correspondence factor analysis (CFA) has difficulties interpreting distances between objects. Spectral map analysis (SMA) is introduced as an alternative approach to the analysis of microarray data. Weighted SMA outperforms PCA, and is at least as powerful as CFA, in finding clusters in the samples, as well as identifying genes related to these clusters. SMA addresses the problem of data analysis in microarray experiments in a more appropriate manner than CFA, and allows more flexible weighting to the genes and samples. Proper weighting is important, since it enables less reliable data to be down-weighted and more reliable information to be emphasized.</description><identifier>ISSN: 0006-341X</identifier><identifier>EISSN: 1541-0420</identifier><identifier>DOI: 10.1111/j.0006-341X.2003.00130.x</identifier><identifier>PMID: 14969494</identifier><language>eng</language><publisher>350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K: Blackwell Publishing</publisher><subject>Bioinformatics ; Biometry - methods ; Biplot ; Consultant's Forum ; Correspondence factor analysis ; Data mining ; Data visualization ; Datasets ; Gene Expression ; Gene expression data ; Genes ; Genetic mapping ; Leukemia ; Microarray data ; Models, Genetic ; Models, Statistical ; Multivariate Analysis ; Multivariate exploratory data analysis ; Oligonucleotide Array Sequence Analysis - methods ; Principal component analysis ; Principal components analysis ; Reproducibility of Results ; Spectral map analysis ; Spectroscopic analysis ; Statistical variance ; T lymphocytes ; Term weighting</subject><ispartof>Biometrics, 2003-12, Vol.59 (4), p.1131-1139</ispartof><rights>Copyright 2003 The International Biometric Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5290-c953447efa0f54ca4c7b6f384382f688efbb7abe95cea7ea587b2e5f773ab4dc3</citedby><cites>FETCH-LOGICAL-c5290-c953447efa0f54ca4c7b6f384382f688efbb7abe95cea7ea587b2e5f773ab4dc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/3695355$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/3695355$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,780,784,803,832,1417,27924,27925,45574,45575,58017,58021,58250,58254</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14969494$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wouters, Luc</creatorcontrib><creatorcontrib>Göhlmann, Hinrich W.</creatorcontrib><creatorcontrib>Bijnens, Luc</creatorcontrib><creatorcontrib>Kass, Stefan U.</creatorcontrib><creatorcontrib>Molenberghs, Geert</creatorcontrib><creatorcontrib>Lewi, Paul J.</creatorcontrib><title>Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods</title><title>Biometrics</title><addtitle>Biometrics</addtitle><description>This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the resulting principal factors are not very informative, while correspondence factor analysis (CFA) has difficulties interpreting distances between objects. Spectral map analysis (SMA) is introduced as an alternative approach to the analysis of microarray data. Weighted SMA outperforms PCA, and is at least as powerful as CFA, in finding clusters in the samples, as well as identifying genes related to these clusters. SMA addresses the problem of data analysis in microarray experiments in a more appropriate manner than CFA, and allows more flexible weighting to the genes and samples. Proper weighting is important, since it enables less reliable data to be down-weighted and more reliable information to be emphasized.</description><subject>Bioinformatics</subject><subject>Biometry - methods</subject><subject>Biplot</subject><subject>Consultant's Forum</subject><subject>Correspondence factor analysis</subject><subject>Data mining</subject><subject>Data visualization</subject><subject>Datasets</subject><subject>Gene Expression</subject><subject>Gene expression data</subject><subject>Genes</subject><subject>Genetic mapping</subject><subject>Leukemia</subject><subject>Microarray data</subject><subject>Models, Genetic</subject><subject>Models, Statistical</subject><subject>Multivariate Analysis</subject><subject>Multivariate exploratory data analysis</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Principal component analysis</subject><subject>Principal components analysis</subject><subject>Reproducibility of Results</subject><subject>Spectral map analysis</subject><subject>Spectroscopic analysis</subject><subject>Statistical variance</subject><subject>T lymphocytes</subject><subject>Term weighting</subject><issn>0006-341X</issn><issn>1541-0420</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkM9v0zAUxy0EYmXsP0AoJ24pdmzHCRKHrYwyaWWH_YCbeXGf1ZS0zuwE2v8eh1Tddb7Y773P91n6EJIwOmXxfFxPKaV5ygX7Oc0o5bFknE53L8iEScFSKjL6kkyO0Al5E8I6lqWk2WtywkSZl6IUE_Jr7qFd1Qaa5HLXNs5DV7tt4mwyxy0OPY8hDK0v0MGn5DyZuU0LA_YHk9uuX-4H-G7lEZNF38Q2-Bq6WGC3csvwlryy0AQ8O9yn5P7r5d3sW3p9M7-anV-nRmYlTU0puRAKLVArhQFhVJVbXgheZDYvCrRVpaDCUhoEhSALVWUorVIcKrE0_JR8GPe23j32GDq9qYPBpoEtuj5oxWSRUUEjWIyg8S4Ej1a3vt6A32tG9WBXr_UgTg_i9GBX_7erdzH6_vBHX21w-RQ86IzA5xH4Wze4f_ZifXF1s4ivmH835tehc_6Y53m0I2Ucp-O4Dh3ujmPwv3WuuJL6x_e5vlXigV08cL3g_wApxaJZ</recordid><startdate>200312</startdate><enddate>200312</enddate><creator>Wouters, Luc</creator><creator>Göhlmann, Hinrich W.</creator><creator>Bijnens, Luc</creator><creator>Kass, Stefan U.</creator><creator>Molenberghs, Geert</creator><creator>Lewi, Paul J.</creator><general>Blackwell Publishing</general><general>International Biometric Society</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>200312</creationdate><title>Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods</title><author>Wouters, Luc ; Göhlmann, Hinrich W. ; Bijnens, Luc ; Kass, Stefan U. ; Molenberghs, Geert ; Lewi, Paul J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5290-c953447efa0f54ca4c7b6f384382f688efbb7abe95cea7ea587b2e5f773ab4dc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Bioinformatics</topic><topic>Biometry - methods</topic><topic>Biplot</topic><topic>Consultant's Forum</topic><topic>Correspondence factor analysis</topic><topic>Data mining</topic><topic>Data visualization</topic><topic>Datasets</topic><topic>Gene Expression</topic><topic>Gene expression data</topic><topic>Genes</topic><topic>Genetic mapping</topic><topic>Leukemia</topic><topic>Microarray data</topic><topic>Models, Genetic</topic><topic>Models, Statistical</topic><topic>Multivariate Analysis</topic><topic>Multivariate exploratory data analysis</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Principal component analysis</topic><topic>Principal components analysis</topic><topic>Reproducibility of Results</topic><topic>Spectral map analysis</topic><topic>Spectroscopic analysis</topic><topic>Statistical variance</topic><topic>T lymphocytes</topic><topic>Term weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wouters, Luc</creatorcontrib><creatorcontrib>Göhlmann, Hinrich W.</creatorcontrib><creatorcontrib>Bijnens, Luc</creatorcontrib><creatorcontrib>Kass, Stefan U.</creatorcontrib><creatorcontrib>Molenberghs, Geert</creatorcontrib><creatorcontrib>Lewi, Paul J.</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Biometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wouters, Luc</au><au>Göhlmann, Hinrich W.</au><au>Bijnens, Luc</au><au>Kass, Stefan U.</au><au>Molenberghs, Geert</au><au>Lewi, Paul J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods</atitle><jtitle>Biometrics</jtitle><addtitle>Biometrics</addtitle><date>2003-12</date><risdate>2003</risdate><volume>59</volume><issue>4</issue><spage>1131</spage><epage>1139</epage><pages>1131-1139</pages><issn>0006-341X</issn><eissn>1541-0420</eissn><abstract>This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the resulting principal factors are not very informative, while correspondence factor analysis (CFA) has difficulties interpreting distances between objects. Spectral map analysis (SMA) is introduced as an alternative approach to the analysis of microarray data. Weighted SMA outperforms PCA, and is at least as powerful as CFA, in finding clusters in the samples, as well as identifying genes related to these clusters. SMA addresses the problem of data analysis in microarray experiments in a more appropriate manner than CFA, and allows more flexible weighting to the genes and samples. Proper weighting is important, since it enables less reliable data to be down-weighted and more reliable information to be emphasized.</abstract><cop>350 Main Street , Malden , MA 02148 , U.S.A , and P.O. Box 1354, 9600 Garsington Road , Oxford OX4 2DQ , U.K</cop><pub>Blackwell Publishing</pub><pmid>14969494</pmid><doi>10.1111/j.0006-341X.2003.00130.x</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0006-341X
ispartof Biometrics, 2003-12, Vol.59 (4), p.1131-1139
issn 0006-341X
1541-0420
language eng
recordid cdi_proquest_miscellaneous_71582040
source MEDLINE; JSTOR Mathematics & Statistics; Access via Wiley Online Library; JSTOR Archive Collection A-Z Listing; Oxford University Press Journals All Titles (1996-Current)
subjects Bioinformatics
Biometry - methods
Biplot
Consultant's Forum
Correspondence factor analysis
Data mining
Data visualization
Datasets
Gene Expression
Gene expression data
Genes
Genetic mapping
Leukemia
Microarray data
Models, Genetic
Models, Statistical
Multivariate Analysis
Multivariate exploratory data analysis
Oligonucleotide Array Sequence Analysis - methods
Principal component analysis
Principal components analysis
Reproducibility of Results
Spectral map analysis
Spectroscopic analysis
Statistical variance
T lymphocytes
Term weighting
title Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T16%3A29%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Graphical%20Exploration%20of%20Gene%20Expression%20Data:%20A%20Comparative%20Study%20of%20Three%20Multivariate%20Methods&rft.jtitle=Biometrics&rft.au=Wouters,%20Luc&rft.date=2003-12&rft.volume=59&rft.issue=4&rft.spage=1131&rft.epage=1139&rft.pages=1131-1139&rft.issn=0006-341X&rft.eissn=1541-0420&rft_id=info:doi/10.1111/j.0006-341X.2003.00130.x&rft_dat=%3Cjstor_proqu%3E3695355%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=71582040&rft_id=info:pmid/14969494&rft_jstor_id=3695355&rfr_iscdi=true