Defining the estimated core genome of bacterial populations using a Bayesian decision model

The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing da...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2014-08, Vol.10 (8), p.e1003788-e1003788
Hauptverfasser: van Tonder, Andries J, Mistry, Shilan, Bray, James E, Hill, Dorothea M C, Cody, Alison J, Farmer, Chris L, Klugman, Keith P, von Gottberg, Anne, Bentley, Stephen D, Parkhill, Julian, Jolley, Keith A, Maiden, Martin C J, Brueggemann, Angela B
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1003788
container_issue 8
container_start_page e1003788
container_title PLoS computational biology
container_volume 10
creator van Tonder, Andries J
Mistry, Shilan
Bray, James E
Hill, Dorothea M C
Cody, Alison J
Farmer, Chris L
Klugman, Keith P
von Gottberg, Anne
Bentley, Stephen D
Parkhill, Julian
Jolley, Keith A
Maiden, Martin C J
Brueggemann, Angela B
description The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.
doi_str_mv 10.1371/journal.pcbi.1003788
format Article
fullrecord <record><control><sourceid>proquest_plos_</sourceid><recordid>TN_cdi_plos_journals_1685029466</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_ecab11f7b0a24086a7f07fcad8d07f2e</doaj_id><sourcerecordid>1555620388</sourcerecordid><originalsourceid>FETCH-LOGICAL-c498t-242f056b574584a27f2a8033406051bdf813341ea6e5ebfe7dd77215408cacef3</originalsourceid><addsrcrecordid>eNpVUstu1DAUjRCIlsIfIPCSzQx2_JwNEpRXpUpsYMXCurGvpx4lcbATpP49Hiat2tW1fc859-HTNK8Z3TKu2ftDWvII_XZyXdwySrk25klzzqTkG82lefrgfNa8KOVQMdLs1PPmrJVMCMXUefP7M4Y4xnFP5hskWOY4wIyeuJSR7HFMA5IUSAduxhyhJ1Oalh7mmMZClnIkAvkEt1gijMSji6WmyJA89i-bZwH6gq_WeNH8-vrl5-X3zfWPb1eXH683TuzMvGlFG6hUndRCGgGtDi0YyrmgikrW-WBYvTAEhRK7gNp7rVsmBTUOHAZ-0bw96U59KnbdS7FMGUnbnVCqIq5OCJ_gYKdch8y3NkG0_x9S3lvIc3Q9WnTQMRZ0R6GtFRToQHVw4I2vscWq9WGttnQDeofjnKF_JPo4M8Ybu09_rWB1Is6rwLtVIKc_S125HWJx2PcwYlpq31JK1VJuTIWKE9TlVErGcF-GUXt0wd209ugCu7qg0t48bPGedPft_B-SL7IU</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1555620388</pqid></control><display><type>article</type><title>Defining the estimated core genome of bacterial populations using a Bayesian decision model</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>van Tonder, Andries J ; Mistry, Shilan ; Bray, James E ; Hill, Dorothea M C ; Cody, Alison J ; Farmer, Chris L ; Klugman, Keith P ; von Gottberg, Anne ; Bentley, Stephen D ; Parkhill, Julian ; Jolley, Keith A ; Maiden, Martin C J ; Brueggemann, Angela B</creator><creatorcontrib>van Tonder, Andries J ; Mistry, Shilan ; Bray, James E ; Hill, Dorothea M C ; Cody, Alison J ; Farmer, Chris L ; Klugman, Keith P ; von Gottberg, Anne ; Bentley, Stephen D ; Parkhill, Julian ; Jolley, Keith A ; Maiden, Martin C J ; Brueggemann, Angela B</creatorcontrib><description>The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1003788</identifier><identifier>PMID: 25144616</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Bacterial Proteins - genetics ; Bayes Theorem ; Biology and Life Sciences ; Campylobacter jejuni - genetics ; Databases, Genetic ; Datasets ; Genes ; Genetic diversity ; Genome, Bacterial - genetics ; Genomes ; Genomics ; Medicine and Health Sciences ; Methods ; Models, Genetic ; Neisseria meningitidis - genetics ; Population ; Public health ; Streptococcus infections ; Studies</subject><ispartof>PLoS computational biology, 2014-08, Vol.10 (8), p.e1003788-e1003788</ispartof><rights>2014 van Tonder et al 2014 van Tonder et al</rights><rights>2014 Public Library of Science. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: van Tonder AJ, Mistry S, Bray JE, Hill DMC, Cody AJ, Farmer CL, et al. (2014) Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model. PLoS Comput Biol 10(8): e1003788. doi:10.1371/journal.pcbi.1003788</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c498t-242f056b574584a27f2a8033406051bdf813341ea6e5ebfe7dd77215408cacef3</citedby><cites>FETCH-LOGICAL-c498t-242f056b574584a27f2a8033406051bdf813341ea6e5ebfe7dd77215408cacef3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4140633/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4140633/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79569,79570</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25144616$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>van Tonder, Andries J</creatorcontrib><creatorcontrib>Mistry, Shilan</creatorcontrib><creatorcontrib>Bray, James E</creatorcontrib><creatorcontrib>Hill, Dorothea M C</creatorcontrib><creatorcontrib>Cody, Alison J</creatorcontrib><creatorcontrib>Farmer, Chris L</creatorcontrib><creatorcontrib>Klugman, Keith P</creatorcontrib><creatorcontrib>von Gottberg, Anne</creatorcontrib><creatorcontrib>Bentley, Stephen D</creatorcontrib><creatorcontrib>Parkhill, Julian</creatorcontrib><creatorcontrib>Jolley, Keith A</creatorcontrib><creatorcontrib>Maiden, Martin C J</creatorcontrib><creatorcontrib>Brueggemann, Angela B</creatorcontrib><title>Defining the estimated core genome of bacterial populations using a Bayesian decision model</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.</description><subject>Bacterial Proteins - genetics</subject><subject>Bayes Theorem</subject><subject>Biology and Life Sciences</subject><subject>Campylobacter jejuni - genetics</subject><subject>Databases, Genetic</subject><subject>Datasets</subject><subject>Genes</subject><subject>Genetic diversity</subject><subject>Genome, Bacterial - genetics</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Medicine and Health Sciences</subject><subject>Methods</subject><subject>Models, Genetic</subject><subject>Neisseria meningitidis - genetics</subject><subject>Population</subject><subject>Public health</subject><subject>Streptococcus infections</subject><subject>Studies</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>DOA</sourceid><recordid>eNpVUstu1DAUjRCIlsIfIPCSzQx2_JwNEpRXpUpsYMXCurGvpx4lcbATpP49Hiat2tW1fc859-HTNK8Z3TKu2ftDWvII_XZyXdwySrk25klzzqTkG82lefrgfNa8KOVQMdLs1PPmrJVMCMXUefP7M4Y4xnFP5hskWOY4wIyeuJSR7HFMA5IUSAduxhyhJ1Oalh7mmMZClnIkAvkEt1gijMSji6WmyJA89i-bZwH6gq_WeNH8-vrl5-X3zfWPb1eXH683TuzMvGlFG6hUndRCGgGtDi0YyrmgikrW-WBYvTAEhRK7gNp7rVsmBTUOHAZ-0bw96U59KnbdS7FMGUnbnVCqIq5OCJ_gYKdch8y3NkG0_x9S3lvIc3Q9WnTQMRZ0R6GtFRToQHVw4I2vscWq9WGttnQDeofjnKF_JPo4M8Ybu09_rWB1Is6rwLtVIKc_S125HWJx2PcwYlpq31JK1VJuTIWKE9TlVErGcF-GUXt0wd209ugCu7qg0t48bPGedPft_B-SL7IU</recordid><startdate>20140801</startdate><enddate>20140801</enddate><creator>van Tonder, Andries J</creator><creator>Mistry, Shilan</creator><creator>Bray, James E</creator><creator>Hill, Dorothea M C</creator><creator>Cody, Alison J</creator><creator>Farmer, Chris L</creator><creator>Klugman, Keith P</creator><creator>von Gottberg, Anne</creator><creator>Bentley, Stephen D</creator><creator>Parkhill, Julian</creator><creator>Jolley, Keith A</creator><creator>Maiden, Martin C J</creator><creator>Brueggemann, Angela B</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20140801</creationdate><title>Defining the estimated core genome of bacterial populations using a Bayesian decision model</title><author>van Tonder, Andries J ; Mistry, Shilan ; Bray, James E ; Hill, Dorothea M C ; Cody, Alison J ; Farmer, Chris L ; Klugman, Keith P ; von Gottberg, Anne ; Bentley, Stephen D ; Parkhill, Julian ; Jolley, Keith A ; Maiden, Martin C J ; Brueggemann, Angela B</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c498t-242f056b574584a27f2a8033406051bdf813341ea6e5ebfe7dd77215408cacef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Bacterial Proteins - genetics</topic><topic>Bayes Theorem</topic><topic>Biology and Life Sciences</topic><topic>Campylobacter jejuni - genetics</topic><topic>Databases, Genetic</topic><topic>Datasets</topic><topic>Genes</topic><topic>Genetic diversity</topic><topic>Genome, Bacterial - genetics</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Medicine and Health Sciences</topic><topic>Methods</topic><topic>Models, Genetic</topic><topic>Neisseria meningitidis - genetics</topic><topic>Population</topic><topic>Public health</topic><topic>Streptococcus infections</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>van Tonder, Andries J</creatorcontrib><creatorcontrib>Mistry, Shilan</creatorcontrib><creatorcontrib>Bray, James E</creatorcontrib><creatorcontrib>Hill, Dorothea M C</creatorcontrib><creatorcontrib>Cody, Alison J</creatorcontrib><creatorcontrib>Farmer, Chris L</creatorcontrib><creatorcontrib>Klugman, Keith P</creatorcontrib><creatorcontrib>von Gottberg, Anne</creatorcontrib><creatorcontrib>Bentley, Stephen D</creatorcontrib><creatorcontrib>Parkhill, Julian</creatorcontrib><creatorcontrib>Jolley, Keith A</creatorcontrib><creatorcontrib>Maiden, Martin C J</creatorcontrib><creatorcontrib>Brueggemann, Angela B</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>van Tonder, Andries J</au><au>Mistry, Shilan</au><au>Bray, James E</au><au>Hill, Dorothea M C</au><au>Cody, Alison J</au><au>Farmer, Chris L</au><au>Klugman, Keith P</au><au>von Gottberg, Anne</au><au>Bentley, Stephen D</au><au>Parkhill, Julian</au><au>Jolley, Keith A</au><au>Maiden, Martin C J</au><au>Brueggemann, Angela B</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Defining the estimated core genome of bacterial populations using a Bayesian decision model</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2014-08-01</date><risdate>2014</risdate><volume>10</volume><issue>8</issue><spage>e1003788</spage><epage>e1003788</epage><pages>e1003788-e1003788</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>25144616</pmid><doi>10.1371/journal.pcbi.1003788</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2014-08, Vol.10 (8), p.e1003788-e1003788
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_1685029466
source Public Library of Science (PLoS) Journals Open Access; MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
subjects Bacterial Proteins - genetics
Bayes Theorem
Biology and Life Sciences
Campylobacter jejuni - genetics
Databases, Genetic
Datasets
Genes
Genetic diversity
Genome, Bacterial - genetics
Genomes
Genomics
Medicine and Health Sciences
Methods
Models, Genetic
Neisseria meningitidis - genetics
Population
Public health
Streptococcus infections
Studies
title Defining the estimated core genome of bacterial populations using a Bayesian decision model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-21T19%3A32%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Defining%20the%20estimated%20core%20genome%20of%20bacterial%20populations%20using%20a%20Bayesian%20decision%20model&rft.jtitle=PLoS%20computational%20biology&rft.au=van%20Tonder,%20Andries%20J&rft.date=2014-08-01&rft.volume=10&rft.issue=8&rft.spage=e1003788&rft.epage=e1003788&rft.pages=e1003788-e1003788&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1003788&rft_dat=%3Cproquest_plos_%3E1555620388%3C/proquest_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1555620388&rft_id=info:pmid/25144616&rft_doaj_id=oai_doaj_org_article_ecab11f7b0a24086a7f07fcad8d07f2e&rfr_iscdi=true