Expression reflects population structure

Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS genetics 2018-12, Vol.14 (12), p.e1007841-e1007841
Hauptverfasser: Brown, Brielin C, Bray, Nicolas L, Pachter, Lior
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1007841
container_issue 12
container_start_page e1007841
container_title PLoS genetics
container_volume 14
creator Brown, Brielin C
Bray, Nicolas L
Pachter, Lior
description Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.
doi_str_mv 10.1371/journal.pgen.1007841
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2251022676</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A568167832</galeid><doaj_id>oai_doaj_org_article_5ae14d601c7a4ecf907a3419eac6c9f3</doaj_id><sourcerecordid>A568167832</sourcerecordid><originalsourceid>FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</originalsourceid><addsrcrecordid>eNqVkl1rFDEUhgdRbK3-A9GCIPVi13xMksmNUErVhWLBr9twJpPsTslOpklG6r83052WHemFkouEk-e8yZu8RfESoyWmAr-_8kPowC37temWGCFRlfhRcYgZowtRovLx3vqgeBbjFUKUVVI8LQ4oYpyXVB4WJ-c3fTAxtr47DsY6o1M87n0_OEhjLaYw6DQE87x4YsFF82Kaj4ofH8-_n31eXFx-Wp2dXiy0IDwtSC0kE4xpiylhXBpU1pxAbaEyFvGSVQ0wSWsGjaUgs5PacmIxAS11ZSU9Kl7vdHvno5pMRkUIw4gQLngmVjui8XCl-tBuIfxWHlp1W_BhrSCkVjujGBhcNhxhLaA02kokgJZYGtBcS0uz1ofptKHemkabLgVwM9H5Ttdu1Nr_UpxiUWGSBU4mgeCvBxOT2rZRG-egM37I98bZLcngiL75C33Y3UStIRtoO-vzuXoUVaeMV5iL6lZr-QCVR2O2rfadsW2uzxrezRoyk8xNWsMQo1p9-_of7Jd_Zy9_ztm3e-zGgEub6N0wBi3OwXIH6uBjzLG8_xCM1Bj-u5dTY_jVFP7c9mr_M--b7tJO_wBYxv06</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2251022676</pqid></control><display><type>article</type><title>Expression reflects population structure</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Brown, Brielin C ; Bray, Nicolas L ; Pachter, Lior</creator><contributor>Di Rienzo, Anna</contributor><creatorcontrib>Brown, Brielin C ; Bray, Nicolas L ; Pachter, Lior ; Di Rienzo, Anna</creatorcontrib><description>Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.</description><identifier>ISSN: 1553-7404</identifier><identifier>ISSN: 1553-7390</identifier><identifier>EISSN: 1553-7404</identifier><identifier>DOI: 10.1371/journal.pgen.1007841</identifier><identifier>PMID: 30566439</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Bioinformatics ; Biology ; Biology and Life Sciences ; Consortia ; Correlation analysis ; Datasets ; Female ; Gene Expression ; Gene Frequency ; Genetic Variation ; Genetics, Population ; Genomics ; Genotype ; Genotype &amp; phenotype ; Genotypes ; Humans ; Male ; Methods ; Ontology ; People and Places ; Physical Sciences ; Polymorphism, Single Nucleotide ; Population ; Population structure ; Principal Component Analysis ; Principal components analysis ; Quantitative Trait Loci ; Research and Analysis Methods ; Sequence Analysis, RNA ; Single nucleotide polymorphisms ; Single-nucleotide polymorphism ; Whole Genome Sequencing</subject><ispartof>PLoS genetics, 2018-12, Vol.14 (12), p.e1007841-e1007841</ispartof><rights>COPYRIGHT 2018 Public Library of Science</rights><rights>2018 Brown et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2018 Brown et al 2018 Brown et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</citedby><cites>FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</cites><orcidid>0000-0001-5569-5223</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317812/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317812/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,865,886,2103,2929,23868,27926,27927,53793,53795</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30566439$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Di Rienzo, Anna</contributor><creatorcontrib>Brown, Brielin C</creatorcontrib><creatorcontrib>Bray, Nicolas L</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><title>Expression reflects population structure</title><title>PLoS genetics</title><addtitle>PLoS Genet</addtitle><description>Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.</description><subject>Bioinformatics</subject><subject>Biology</subject><subject>Biology and Life Sciences</subject><subject>Consortia</subject><subject>Correlation analysis</subject><subject>Datasets</subject><subject>Female</subject><subject>Gene Expression</subject><subject>Gene Frequency</subject><subject>Genetic Variation</subject><subject>Genetics, Population</subject><subject>Genomics</subject><subject>Genotype</subject><subject>Genotype &amp; phenotype</subject><subject>Genotypes</subject><subject>Humans</subject><subject>Male</subject><subject>Methods</subject><subject>Ontology</subject><subject>People and Places</subject><subject>Physical Sciences</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Population</subject><subject>Population structure</subject><subject>Principal Component Analysis</subject><subject>Principal components analysis</subject><subject>Quantitative Trait Loci</subject><subject>Research and Analysis Methods</subject><subject>Sequence Analysis, RNA</subject><subject>Single nucleotide polymorphisms</subject><subject>Single-nucleotide polymorphism</subject><subject>Whole Genome Sequencing</subject><issn>1553-7404</issn><issn>1553-7390</issn><issn>1553-7404</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkl1rFDEUhgdRbK3-A9GCIPVi13xMksmNUErVhWLBr9twJpPsTslOpklG6r83052WHemFkouEk-e8yZu8RfESoyWmAr-_8kPowC37temWGCFRlfhRcYgZowtRovLx3vqgeBbjFUKUVVI8LQ4oYpyXVB4WJ-c3fTAxtr47DsY6o1M87n0_OEhjLaYw6DQE87x4YsFF82Kaj4ofH8-_n31eXFx-Wp2dXiy0IDwtSC0kE4xpiylhXBpU1pxAbaEyFvGSVQ0wSWsGjaUgs5PacmIxAS11ZSU9Kl7vdHvno5pMRkUIw4gQLngmVjui8XCl-tBuIfxWHlp1W_BhrSCkVjujGBhcNhxhLaA02kokgJZYGtBcS0uz1ofptKHemkabLgVwM9H5Ttdu1Nr_UpxiUWGSBU4mgeCvBxOT2rZRG-egM37I98bZLcngiL75C33Y3UStIRtoO-vzuXoUVaeMV5iL6lZr-QCVR2O2rfadsW2uzxrezRoyk8xNWsMQo1p9-_of7Jd_Zy9_ztm3e-zGgEub6N0wBi3OwXIH6uBjzLG8_xCM1Bj-u5dTY_jVFP7c9mr_M--b7tJO_wBYxv06</recordid><startdate>20181219</startdate><enddate>20181219</enddate><creator>Brown, Brielin C</creator><creator>Bray, Nicolas L</creator><creator>Pachter, Lior</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-5569-5223</orcidid></search><sort><creationdate>20181219</creationdate><title>Expression reflects population structure</title><author>Brown, Brielin C ; Bray, Nicolas L ; Pachter, Lior</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Bioinformatics</topic><topic>Biology</topic><topic>Biology and Life Sciences</topic><topic>Consortia</topic><topic>Correlation analysis</topic><topic>Datasets</topic><topic>Female</topic><topic>Gene Expression</topic><topic>Gene Frequency</topic><topic>Genetic Variation</topic><topic>Genetics, Population</topic><topic>Genomics</topic><topic>Genotype</topic><topic>Genotype &amp; phenotype</topic><topic>Genotypes</topic><topic>Humans</topic><topic>Male</topic><topic>Methods</topic><topic>Ontology</topic><topic>People and Places</topic><topic>Physical Sciences</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Population</topic><topic>Population structure</topic><topic>Principal Component Analysis</topic><topic>Principal components analysis</topic><topic>Quantitative Trait Loci</topic><topic>Research and Analysis Methods</topic><topic>Sequence Analysis, RNA</topic><topic>Single nucleotide polymorphisms</topic><topic>Single-nucleotide polymorphism</topic><topic>Whole Genome Sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Brown, Brielin C</creatorcontrib><creatorcontrib>Bray, Nicolas L</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Brown, Brielin C</au><au>Bray, Nicolas L</au><au>Pachter, Lior</au><au>Di Rienzo, Anna</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Expression reflects population structure</atitle><jtitle>PLoS genetics</jtitle><addtitle>PLoS Genet</addtitle><date>2018-12-19</date><risdate>2018</risdate><volume>14</volume><issue>12</issue><spage>e1007841</spage><epage>e1007841</epage><pages>e1007841-e1007841</pages><issn>1553-7404</issn><issn>1553-7390</issn><eissn>1553-7404</eissn><abstract>Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>30566439</pmid><doi>10.1371/journal.pgen.1007841</doi><orcidid>https://orcid.org/0000-0001-5569-5223</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7404
ispartof PLoS genetics, 2018-12, Vol.14 (12), p.e1007841-e1007841
issn 1553-7404
1553-7390
1553-7404
language eng
recordid cdi_plos_journals_2251022676
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Bioinformatics
Biology
Biology and Life Sciences
Consortia
Correlation analysis
Datasets
Female
Gene Expression
Gene Frequency
Genetic Variation
Genetics, Population
Genomics
Genotype
Genotype & phenotype
Genotypes
Humans
Male
Methods
Ontology
People and Places
Physical Sciences
Polymorphism, Single Nucleotide
Population
Population structure
Principal Component Analysis
Principal components analysis
Quantitative Trait Loci
Research and Analysis Methods
Sequence Analysis, RNA
Single nucleotide polymorphisms
Single-nucleotide polymorphism
Whole Genome Sequencing
title Expression reflects population structure
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T04%3A24%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Expression%20reflects%20population%20structure&rft.jtitle=PLoS%20genetics&rft.au=Brown,%20Brielin%20C&rft.date=2018-12-19&rft.volume=14&rft.issue=12&rft.spage=e1007841&rft.epage=e1007841&rft.pages=e1007841-e1007841&rft.issn=1553-7404&rft.eissn=1553-7404&rft_id=info:doi/10.1371/journal.pgen.1007841&rft_dat=%3Cgale_plos_%3EA568167832%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2251022676&rft_id=info:pmid/30566439&rft_galeid=A568167832&rft_doaj_id=oai_doaj_org_article_5ae14d601c7a4ecf907a3419eac6c9f3&rfr_iscdi=true