Expression reflects population structure
Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of...
Gespeichert in:
Veröffentlicht in: | PLoS genetics 2018-12, Vol.14 (12), p.e1007841-e1007841 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e1007841 |
---|---|
container_issue | 12 |
container_start_page | e1007841 |
container_title | PLoS genetics |
container_volume | 14 |
creator | Brown, Brielin C Bray, Nicolas L Pachter, Lior |
description | Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate. |
doi_str_mv | 10.1371/journal.pgen.1007841 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2251022676</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A568167832</galeid><doaj_id>oai_doaj_org_article_5ae14d601c7a4ecf907a3419eac6c9f3</doaj_id><sourcerecordid>A568167832</sourcerecordid><originalsourceid>FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</originalsourceid><addsrcrecordid>eNqVkl1rFDEUhgdRbK3-A9GCIPVi13xMksmNUErVhWLBr9twJpPsTslOpklG6r83052WHemFkouEk-e8yZu8RfESoyWmAr-_8kPowC37temWGCFRlfhRcYgZowtRovLx3vqgeBbjFUKUVVI8LQ4oYpyXVB4WJ-c3fTAxtr47DsY6o1M87n0_OEhjLaYw6DQE87x4YsFF82Kaj4ofH8-_n31eXFx-Wp2dXiy0IDwtSC0kE4xpiylhXBpU1pxAbaEyFvGSVQ0wSWsGjaUgs5PacmIxAS11ZSU9Kl7vdHvno5pMRkUIw4gQLngmVjui8XCl-tBuIfxWHlp1W_BhrSCkVjujGBhcNhxhLaA02kokgJZYGtBcS0uz1ofptKHemkabLgVwM9H5Ttdu1Nr_UpxiUWGSBU4mgeCvBxOT2rZRG-egM37I98bZLcngiL75C33Y3UStIRtoO-vzuXoUVaeMV5iL6lZr-QCVR2O2rfadsW2uzxrezRoyk8xNWsMQo1p9-_of7Jd_Zy9_ztm3e-zGgEub6N0wBi3OwXIH6uBjzLG8_xCM1Bj-u5dTY_jVFP7c9mr_M--b7tJO_wBYxv06</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2251022676</pqid></control><display><type>article</type><title>Expression reflects population structure</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Brown, Brielin C ; Bray, Nicolas L ; Pachter, Lior</creator><contributor>Di Rienzo, Anna</contributor><creatorcontrib>Brown, Brielin C ; Bray, Nicolas L ; Pachter, Lior ; Di Rienzo, Anna</creatorcontrib><description>Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.</description><identifier>ISSN: 1553-7404</identifier><identifier>ISSN: 1553-7390</identifier><identifier>EISSN: 1553-7404</identifier><identifier>DOI: 10.1371/journal.pgen.1007841</identifier><identifier>PMID: 30566439</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Bioinformatics ; Biology ; Biology and Life Sciences ; Consortia ; Correlation analysis ; Datasets ; Female ; Gene Expression ; Gene Frequency ; Genetic Variation ; Genetics, Population ; Genomics ; Genotype ; Genotype & phenotype ; Genotypes ; Humans ; Male ; Methods ; Ontology ; People and Places ; Physical Sciences ; Polymorphism, Single Nucleotide ; Population ; Population structure ; Principal Component Analysis ; Principal components analysis ; Quantitative Trait Loci ; Research and Analysis Methods ; Sequence Analysis, RNA ; Single nucleotide polymorphisms ; Single-nucleotide polymorphism ; Whole Genome Sequencing</subject><ispartof>PLoS genetics, 2018-12, Vol.14 (12), p.e1007841-e1007841</ispartof><rights>COPYRIGHT 2018 Public Library of Science</rights><rights>2018 Brown et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2018 Brown et al 2018 Brown et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</citedby><cites>FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</cites><orcidid>0000-0001-5569-5223</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317812/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317812/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,865,886,2103,2929,23868,27926,27927,53793,53795</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30566439$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Di Rienzo, Anna</contributor><creatorcontrib>Brown, Brielin C</creatorcontrib><creatorcontrib>Bray, Nicolas L</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><title>Expression reflects population structure</title><title>PLoS genetics</title><addtitle>PLoS Genet</addtitle><description>Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.</description><subject>Bioinformatics</subject><subject>Biology</subject><subject>Biology and Life Sciences</subject><subject>Consortia</subject><subject>Correlation analysis</subject><subject>Datasets</subject><subject>Female</subject><subject>Gene Expression</subject><subject>Gene Frequency</subject><subject>Genetic Variation</subject><subject>Genetics, Population</subject><subject>Genomics</subject><subject>Genotype</subject><subject>Genotype & phenotype</subject><subject>Genotypes</subject><subject>Humans</subject><subject>Male</subject><subject>Methods</subject><subject>Ontology</subject><subject>People and Places</subject><subject>Physical Sciences</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Population</subject><subject>Population structure</subject><subject>Principal Component Analysis</subject><subject>Principal components analysis</subject><subject>Quantitative Trait Loci</subject><subject>Research and Analysis Methods</subject><subject>Sequence Analysis, RNA</subject><subject>Single nucleotide polymorphisms</subject><subject>Single-nucleotide polymorphism</subject><subject>Whole Genome Sequencing</subject><issn>1553-7404</issn><issn>1553-7390</issn><issn>1553-7404</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkl1rFDEUhgdRbK3-A9GCIPVi13xMksmNUErVhWLBr9twJpPsTslOpklG6r83052WHemFkouEk-e8yZu8RfESoyWmAr-_8kPowC37temWGCFRlfhRcYgZowtRovLx3vqgeBbjFUKUVVI8LQ4oYpyXVB4WJ-c3fTAxtr47DsY6o1M87n0_OEhjLaYw6DQE87x4YsFF82Kaj4ofH8-_n31eXFx-Wp2dXiy0IDwtSC0kE4xpiylhXBpU1pxAbaEyFvGSVQ0wSWsGjaUgs5PacmIxAS11ZSU9Kl7vdHvno5pMRkUIw4gQLngmVjui8XCl-tBuIfxWHlp1W_BhrSCkVjujGBhcNhxhLaA02kokgJZYGtBcS0uz1ofptKHemkabLgVwM9H5Ttdu1Nr_UpxiUWGSBU4mgeCvBxOT2rZRG-egM37I98bZLcngiL75C33Y3UStIRtoO-vzuXoUVaeMV5iL6lZr-QCVR2O2rfadsW2uzxrezRoyk8xNWsMQo1p9-_of7Jd_Zy9_ztm3e-zGgEub6N0wBi3OwXIH6uBjzLG8_xCM1Bj-u5dTY_jVFP7c9mr_M--b7tJO_wBYxv06</recordid><startdate>20181219</startdate><enddate>20181219</enddate><creator>Brown, Brielin C</creator><creator>Bray, Nicolas L</creator><creator>Pachter, Lior</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-5569-5223</orcidid></search><sort><creationdate>20181219</creationdate><title>Expression reflects population structure</title><author>Brown, Brielin C ; Bray, Nicolas L ; Pachter, Lior</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c726t-2b795755cf132569e04b62abfa8ef06458da593b5adf3a9137bf62f12ac9c8f93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Bioinformatics</topic><topic>Biology</topic><topic>Biology and Life Sciences</topic><topic>Consortia</topic><topic>Correlation analysis</topic><topic>Datasets</topic><topic>Female</topic><topic>Gene Expression</topic><topic>Gene Frequency</topic><topic>Genetic Variation</topic><topic>Genetics, Population</topic><topic>Genomics</topic><topic>Genotype</topic><topic>Genotype & phenotype</topic><topic>Genotypes</topic><topic>Humans</topic><topic>Male</topic><topic>Methods</topic><topic>Ontology</topic><topic>People and Places</topic><topic>Physical Sciences</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Population</topic><topic>Population structure</topic><topic>Principal Component Analysis</topic><topic>Principal components analysis</topic><topic>Quantitative Trait Loci</topic><topic>Research and Analysis Methods</topic><topic>Sequence Analysis, RNA</topic><topic>Single nucleotide polymorphisms</topic><topic>Single-nucleotide polymorphism</topic><topic>Whole Genome Sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Brown, Brielin C</creatorcontrib><creatorcontrib>Bray, Nicolas L</creatorcontrib><creatorcontrib>Pachter, Lior</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Brown, Brielin C</au><au>Bray, Nicolas L</au><au>Pachter, Lior</au><au>Di Rienzo, Anna</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Expression reflects population structure</atitle><jtitle>PLoS genetics</jtitle><addtitle>PLoS Genet</addtitle><date>2018-12-19</date><risdate>2018</risdate><volume>14</volume><issue>12</issue><spage>e1007841</spage><epage>e1007841</epage><pages>e1007841-e1007841</pages><issn>1553-7404</issn><issn>1553-7390</issn><eissn>1553-7404</eissn><abstract>Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>30566439</pmid><doi>10.1371/journal.pgen.1007841</doi><orcidid>https://orcid.org/0000-0001-5569-5223</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7404 |
ispartof | PLoS genetics, 2018-12, Vol.14 (12), p.e1007841-e1007841 |
issn | 1553-7404 1553-7390 1553-7404 |
language | eng |
recordid | cdi_plos_journals_2251022676 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central |
subjects | Bioinformatics Biology Biology and Life Sciences Consortia Correlation analysis Datasets Female Gene Expression Gene Frequency Genetic Variation Genetics, Population Genomics Genotype Genotype & phenotype Genotypes Humans Male Methods Ontology People and Places Physical Sciences Polymorphism, Single Nucleotide Population Population structure Principal Component Analysis Principal components analysis Quantitative Trait Loci Research and Analysis Methods Sequence Analysis, RNA Single nucleotide polymorphisms Single-nucleotide polymorphism Whole Genome Sequencing |
title | Expression reflects population structure |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T04%3A24%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Expression%20reflects%20population%20structure&rft.jtitle=PLoS%20genetics&rft.au=Brown,%20Brielin%20C&rft.date=2018-12-19&rft.volume=14&rft.issue=12&rft.spage=e1007841&rft.epage=e1007841&rft.pages=e1007841-e1007841&rft.issn=1553-7404&rft.eissn=1553-7404&rft_id=info:doi/10.1371/journal.pgen.1007841&rft_dat=%3Cgale_plos_%3EA568167832%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2251022676&rft_id=info:pmid/30566439&rft_galeid=A568167832&rft_doaj_id=oai_doaj_org_article_5ae14d601c7a4ecf907a3419eac6c9f3&rfr_iscdi=true |