A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PLoS genetics 2015-11, Vol.11 (11), p.e1005650-e1005650
Hauptverfasser:	Lea, Amanda J, Tung, Jenny, Zhou, Xiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Age Algorithms Alzheimers disease Analysis Arabidopsis thaliana CpG Islands - genetics Datasets Deoxyribonucleic acid DNA DNA methylation DNA Methylation - genetics Epigenetics Experiments Gene expression Genetic aspects Genomes Grants High-Throughput Nucleotide Sequencing Humans Methods Methylation Parameter estimation Physiological aspects Population Sequence Analysis, DNA Software Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	e1005650
container_issue	11
container_start_page	e1005650
container_title	PLoS genetics
container_volume	11
creator	Lea, Amanda J Tung, Jenny Zhou, Xiang
description	Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.
doi_str_mv	10.1371/journal.pgen.1005650
format	Article
fullrecord	<record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1749629261</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A436982723</galeid><doaj_id>oai_doaj_org_article_98aaac149487409ca7fe760976848339</doaj_id><sourcerecordid>A436982723</sourcerecordid><originalsourceid>FETCH-LOGICAL-c698t-cd47943c1ab04e87a4128de524753c81b60ea24a3d092b503041f035a9c35df03</originalsourceid><addsrcrecordid>eNqVk1tv0zAYhiMEYmPwDxBEQkIg0WLHp_gGqewAldZNYsCt5TqfU09uXOIEtf8eZ-2mVuIClAufnve1_Tpflr3EaIyJwB9vQ9822o9XNTRjjBDjDD3KjjFjZCQooo_3-kfZsxhvESKslOJpdlRwJiWT_DjrJvmFh7Wbe_iQn1vrjIOmyz-7Jiyd9vnMraHKZ6ECn9vQ5tMqLTu7cU2dnzlroR3GCTy7muQz6BYbrzsXmtw1yST23roO8hv41UNj7kS608-zJ1b7CC927Un24-L8--nX0eX1l-np5HJkuCy7kamokJQYrOeIQik0xUVZASuoYMSUeM4R6IJqUiFZzBkiiGKb7qilIaxKvZPs9dZ35UNUu8CiwoJKXsiC40RMt0QV9K1atW6p240K2qm7idDWSredMx6ULLXWBlNJyxSpNFpYEBxJwUtaEiKT16fdbv18CZVJwbTaH5gerjRuoerwW1HOhGQ8GbzbGbQh5RU7tXTRgPe6gdAP5yacC5lumdA3W7TW6WiusSE5mgFXE0pSeoUoSKLGf6HSV8HSmdCAdWn-QPD-QJCYDtZdrfsY1fTm23-wV__OXv88ZN_usQvQvlvE4Pvhr4qHIN2Cpg0xtmAfosZIDSVy_-JqKBG1K5Eke7X_TA-i-5ogfwD2SAoI</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1736679041</pqid></control><display><type>article</type><title>A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Lea, Amanda J ; Tung, Jenny ; Zhou, Xiang</creator><creatorcontrib>Lea, Amanda J ; Tung, Jenny ; Zhou, Xiang</creatorcontrib><description>Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.</description><identifier>ISSN: 1553-7404</identifier><identifier>ISSN: 1553-7390</identifier><identifier>EISSN: 1553-7404</identifier><identifier>DOI: 10.1371/journal.pgen.1005650</identifier><identifier>PMID: 26599596</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Age ; Algorithms ; Alzheimers disease ; Analysis ; Arabidopsis thaliana ; CpG Islands - genetics ; Datasets ; Deoxyribonucleic acid ; DNA ; DNA methylation ; DNA Methylation - genetics ; Epigenetics ; Experiments ; Gene expression ; Genetic aspects ; Genomes ; Grants ; High-Throughput Nucleotide Sequencing ; Humans ; Methods ; Methylation ; Parameter estimation ; Physiological aspects ; Population ; Sequence Analysis, DNA ; Software ; Studies</subject><ispartof>PLoS genetics, 2015-11, Vol.11 (11), p.e1005650-e1005650</ispartof><rights>COPYRIGHT 2015 Public Library of Science</rights><rights>2015 Lea et al 2015 Lea et al</rights><rights>2015 Public Library of Science. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Lea AJ, Tung J, Zhou X (2015) A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data. PLoS Genet 11(11): e1005650. doi:10.1371/journal.pgen.1005650</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c698t-cd47943c1ab04e87a4128de524753c81b60ea24a3d092b503041f035a9c35df03</citedby><cites>FETCH-LOGICAL-c698t-cd47943c1ab04e87a4128de524753c81b60ea24a3d092b503041f035a9c35df03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657956/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657956/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79343,79344</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26599596$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lea, Amanda J</creatorcontrib><creatorcontrib>Tung, Jenny</creatorcontrib><creatorcontrib>Zhou, Xiang</creatorcontrib><title>A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data</title><title>PLoS genetics</title><addtitle>PLoS Genet</addtitle><description>Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.</description><subject>Age</subject><subject>Algorithms</subject><subject>Alzheimers disease</subject><subject>Analysis</subject><subject>Arabidopsis thaliana</subject><subject>CpG Islands - genetics</subject><subject>Datasets</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA methylation</subject><subject>DNA Methylation - genetics</subject><subject>Epigenetics</subject><subject>Experiments</subject><subject>Gene expression</subject><subject>Genetic aspects</subject><subject>Genomes</subject><subject>Grants</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>Methods</subject><subject>Methylation</subject><subject>Parameter estimation</subject><subject>Physiological aspects</subject><subject>Population</subject><subject>Sequence Analysis, DNA</subject><subject>Software</subject><subject>Studies</subject><issn>1553-7404</issn><issn>1553-7390</issn><issn>1553-7404</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>DOA</sourceid><recordid>eNqVk1tv0zAYhiMEYmPwDxBEQkIg0WLHp_gGqewAldZNYsCt5TqfU09uXOIEtf8eZ-2mVuIClAufnve1_Tpflr3EaIyJwB9vQ9822o9XNTRjjBDjDD3KjjFjZCQooo_3-kfZsxhvESKslOJpdlRwJiWT_DjrJvmFh7Wbe_iQn1vrjIOmyz-7Jiyd9vnMraHKZ6ECn9vQ5tMqLTu7cU2dnzlroR3GCTy7muQz6BYbrzsXmtw1yST23roO8hv41UNj7kS608-zJ1b7CC927Un24-L8--nX0eX1l-np5HJkuCy7kamokJQYrOeIQik0xUVZASuoYMSUeM4R6IJqUiFZzBkiiGKb7qilIaxKvZPs9dZ35UNUu8CiwoJKXsiC40RMt0QV9K1atW6p240K2qm7idDWSredMx6ULLXWBlNJyxSpNFpYEBxJwUtaEiKT16fdbv18CZVJwbTaH5gerjRuoerwW1HOhGQ8GbzbGbQh5RU7tXTRgPe6gdAP5yacC5lumdA3W7TW6WiusSE5mgFXE0pSeoUoSKLGf6HSV8HSmdCAdWn-QPD-QJCYDtZdrfsY1fTm23-wV__OXv88ZN_usQvQvlvE4Pvhr4qHIN2Cpg0xtmAfosZIDSVy_-JqKBG1K5Eke7X_TA-i-5ogfwD2SAoI</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Lea, Amanda J</creator><creator>Tung, Jenny</creator><creator>Zhou, Xiang</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISN</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20151101</creationdate><title>A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data</title><author>Lea, Amanda J ; Tung, Jenny ; Zhou, Xiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c698t-cd47943c1ab04e87a4128de524753c81b60ea24a3d092b503041f035a9c35df03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Age</topic><topic>Algorithms</topic><topic>Alzheimers disease</topic><topic>Analysis</topic><topic>Arabidopsis thaliana</topic><topic>CpG Islands - genetics</topic><topic>Datasets</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA methylation</topic><topic>DNA Methylation - genetics</topic><topic>Epigenetics</topic><topic>Experiments</topic><topic>Gene expression</topic><topic>Genetic aspects</topic><topic>Genomes</topic><topic>Grants</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>Methods</topic><topic>Methylation</topic><topic>Parameter estimation</topic><topic>Physiological aspects</topic><topic>Population</topic><topic>Sequence Analysis, DNA</topic><topic>Software</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lea, Amanda J</creatorcontrib><creatorcontrib>Tung, Jenny</creatorcontrib><creatorcontrib>Zhou, Xiang</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lea, Amanda J</au><au>Tung, Jenny</au><au>Zhou, Xiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data</atitle><jtitle>PLoS genetics</jtitle><addtitle>PLoS Genet</addtitle><date>2015-11-01</date><risdate>2015</risdate><volume>11</volume><issue>11</issue><spage>e1005650</spage><epage>e1005650</epage><pages>e1005650-e1005650</pages><issn>1553-7404</issn><issn>1553-7390</issn><eissn>1553-7404</eissn><abstract>Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>26599596</pmid><doi>10.1371/journal.pgen.1005650</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1553-7404
ispartof	PLoS genetics, 2015-11, Vol.11 (11), p.e1005650-e1005650
issn	1553-7404 1553-7390 1553-7404
language	eng
recordid	cdi_plos_journals_1749629261
source	Public Library of Science (PLoS) Journals Open Access; MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
subjects	Age Algorithms Alzheimers disease Analysis Arabidopsis thaliana CpG Islands - genetics Datasets Deoxyribonucleic acid DNA DNA methylation DNA Methylation - genetics Epigenetics Experiments Gene expression Genetic aspects Genomes Grants High-Throughput Nucleotide Sequencing Humans Methods Methylation Parameter estimation Physiological aspects Population Sequence Analysis, DNA Software Studies
title	A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T03%3A40%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Flexible,%20Efficient%20Binomial%20Mixed%20Model%20for%20Identifying%20Differential%20DNA%20Methylation%20in%20Bisulfite%20Sequencing%20Data&rft.jtitle=PLoS%20genetics&rft.au=Lea,%20Amanda%20J&rft.date=2015-11-01&rft.volume=11&rft.issue=11&rft.spage=e1005650&rft.epage=e1005650&rft.pages=e1005650-e1005650&rft.issn=1553-7404&rft.eissn=1553-7404&rft_id=info:doi/10.1371/journal.pgen.1005650&rft_dat=%3Cgale_plos_%3EA436982723%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1736679041&rft_id=info:pmid/26599596&rft_galeid=A436982723&rft_doaj_id=oai_doaj_org_article_98aaac149487409ca7fe760976848339&rfr_iscdi=true