Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP

Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC genomics 2016-08, Vol.17 Suppl 5 (Suppl 5), p.498-498, Article 498
Hauptverfasser: Perea, Claudia, De La Hoz, Juan Fernando, Cruz, Daniel Felipe, Lobaton, Juan David, Izquierdo, Paulo, Quintero, Juan Camilo, Raatz, Bodo, Duitama, Jorge
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 498
container_issue Suppl 5
container_start_page 498
container_title BMC genomics
container_volume 17 Suppl 5
creator Perea, Claudia
De La Hoz, Juan Fernando
Cruz, Daniel Felipe
Lobaton, Juan David
Izquierdo, Paulo
Quintero, Juan Camilo
Raatz, Bodo
Duitama, Jorge
description Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics.
doi_str_mv 10.1186/s12864-016-2827-7
format Article
fullrecord <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5009557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A464263897</galeid><sourcerecordid>A464263897</sourcerecordid><originalsourceid>FETCH-LOGICAL-c500t-cefb271eb5f3495bedef74a5eaeb5bcf5d94ebc2ac3a16513952cc3e5c6a9a553</originalsourceid><addsrcrecordid>eNptks1O3DAUha2qqFDoA3RTWeoGFqH-iZ1kUwkQDEiorZh2bTnOdXCV2EPsaTtvj6MBxEiVF7auv3Ose30Q-kjJKaW1_BIpq2VZECoLVrOqqN6gA1pWtGBUlm9fnffR-xh_E0Krmol3aJ9VohYNkwfo6twF522YRp2cwdrrYRNdxMHiHnxImxXgdoMjPKzBG-d7fLw4X57gTieN_7p0j78tlpc_jtCe1UOED0_7Ifp1dfnz4rq4_b64uTi7LYwgJBUGbMsqCq2wvGxECx3YqtQCdC61xoquKaE1TBuuqRSUN4IZw0EYqRstBD9EX7e-q3U7QmfAp0kPajW5UU8bFbRTuzfe3as-_FH5-UaIKhscPxlMIbcUkxpdNDAM2kNYR0VrKiXnJSEZ_bxFez2AmoeUHc2Mq7NSlkzyupkNT_9D5dXB6EzwYF2u7whOdgSZSfAv9Xodo7pZ3u2ydMuaKcQ4gX3plBI1R0BtI6ByBNQcATVrPr0e0Yvi-c_5I0sAq3E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1816633400</pqid></control><display><type>article</type><title>Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Springer Nature OA/Free Journals</source><source>SpringerLink Journals - AutoHoldings</source><source>PubMed Central Open Access</source><creator>Perea, Claudia ; De La Hoz, Juan Fernando ; Cruz, Daniel Felipe ; Lobaton, Juan David ; Izquierdo, Paulo ; Quintero, Juan Camilo ; Raatz, Bodo ; Duitama, Jorge</creator><creatorcontrib>Perea, Claudia ; De La Hoz, Juan Fernando ; Cruz, Daniel Felipe ; Lobaton, Juan David ; Izquierdo, Paulo ; Quintero, Juan Camilo ; Raatz, Bodo ; Duitama, Jorge</creatorcontrib><description>Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics.</description><identifier>ISSN: 1471-2164</identifier><identifier>EISSN: 1471-2164</identifier><identifier>DOI: 10.1186/s12864-016-2827-7</identifier><identifier>PMID: 27585926</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Computational Biology ; Genes, Plant ; Genotype ; Genotyping Techniques ; High-Throughput Nucleotide Sequencing ; Identification and classification ; Manihot - genetics ; Phaseolus - genetics ; Population genetics ; Sequence Analysis, DNA ; Single nucleotide polymorphisms</subject><ispartof>BMC genomics, 2016-08, Vol.17 Suppl 5 (Suppl 5), p.498-498, Article 498</ispartof><rights>COPYRIGHT 2016 BioMed Central Ltd.</rights><rights>The Author(s) 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c500t-cefb271eb5f3495bedef74a5eaeb5bcf5d94ebc2ac3a16513952cc3e5c6a9a553</citedby><cites>FETCH-LOGICAL-c500t-cefb271eb5f3495bedef74a5eaeb5bcf5d94ebc2ac3a16513952cc3e5c6a9a553</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009557/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009557/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27585926$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Perea, Claudia</creatorcontrib><creatorcontrib>De La Hoz, Juan Fernando</creatorcontrib><creatorcontrib>Cruz, Daniel Felipe</creatorcontrib><creatorcontrib>Lobaton, Juan David</creatorcontrib><creatorcontrib>Izquierdo, Paulo</creatorcontrib><creatorcontrib>Quintero, Juan Camilo</creatorcontrib><creatorcontrib>Raatz, Bodo</creatorcontrib><creatorcontrib>Duitama, Jorge</creatorcontrib><title>Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP</title><title>BMC genomics</title><addtitle>BMC Genomics</addtitle><description>Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics.</description><subject>Computational Biology</subject><subject>Genes, Plant</subject><subject>Genotype</subject><subject>Genotyping Techniques</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Identification and classification</subject><subject>Manihot - genetics</subject><subject>Phaseolus - genetics</subject><subject>Population genetics</subject><subject>Sequence Analysis, DNA</subject><subject>Single nucleotide polymorphisms</subject><issn>1471-2164</issn><issn>1471-2164</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNptks1O3DAUha2qqFDoA3RTWeoGFqH-iZ1kUwkQDEiorZh2bTnOdXCV2EPsaTtvj6MBxEiVF7auv3Ose30Q-kjJKaW1_BIpq2VZECoLVrOqqN6gA1pWtGBUlm9fnffR-xh_E0Krmol3aJ9VohYNkwfo6twF522YRp2cwdrrYRNdxMHiHnxImxXgdoMjPKzBG-d7fLw4X57gTieN_7p0j78tlpc_jtCe1UOED0_7Ifp1dfnz4rq4_b64uTi7LYwgJBUGbMsqCq2wvGxECx3YqtQCdC61xoquKaE1TBuuqRSUN4IZw0EYqRstBD9EX7e-q3U7QmfAp0kPajW5UU8bFbRTuzfe3as-_FH5-UaIKhscPxlMIbcUkxpdNDAM2kNYR0VrKiXnJSEZ_bxFez2AmoeUHc2Mq7NSlkzyupkNT_9D5dXB6EzwYF2u7whOdgSZSfAv9Xodo7pZ3u2ydMuaKcQ4gX3plBI1R0BtI6ByBNQcATVrPr0e0Yvi-c_5I0sAq3E</recordid><startdate>20160831</startdate><enddate>20160831</enddate><creator>Perea, Claudia</creator><creator>De La Hoz, Juan Fernando</creator><creator>Cruz, Daniel Felipe</creator><creator>Lobaton, Juan David</creator><creator>Izquierdo, Paulo</creator><creator>Quintero, Juan Camilo</creator><creator>Raatz, Bodo</creator><creator>Duitama, Jorge</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160831</creationdate><title>Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP</title><author>Perea, Claudia ; De La Hoz, Juan Fernando ; Cruz, Daniel Felipe ; Lobaton, Juan David ; Izquierdo, Paulo ; Quintero, Juan Camilo ; Raatz, Bodo ; Duitama, Jorge</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c500t-cefb271eb5f3495bedef74a5eaeb5bcf5d94ebc2ac3a16513952cc3e5c6a9a553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computational Biology</topic><topic>Genes, Plant</topic><topic>Genotype</topic><topic>Genotyping Techniques</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Identification and classification</topic><topic>Manihot - genetics</topic><topic>Phaseolus - genetics</topic><topic>Population genetics</topic><topic>Sequence Analysis, DNA</topic><topic>Single nucleotide polymorphisms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Perea, Claudia</creatorcontrib><creatorcontrib>De La Hoz, Juan Fernando</creatorcontrib><creatorcontrib>Cruz, Daniel Felipe</creatorcontrib><creatorcontrib>Lobaton, Juan David</creatorcontrib><creatorcontrib>Izquierdo, Paulo</creatorcontrib><creatorcontrib>Quintero, Juan Camilo</creatorcontrib><creatorcontrib>Raatz, Bodo</creatorcontrib><creatorcontrib>Duitama, Jorge</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC genomics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Perea, Claudia</au><au>De La Hoz, Juan Fernando</au><au>Cruz, Daniel Felipe</au><au>Lobaton, Juan David</au><au>Izquierdo, Paulo</au><au>Quintero, Juan Camilo</au><au>Raatz, Bodo</au><au>Duitama, Jorge</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP</atitle><jtitle>BMC genomics</jtitle><addtitle>BMC Genomics</addtitle><date>2016-08-31</date><risdate>2016</risdate><volume>17 Suppl 5</volume><issue>Suppl 5</issue><spage>498</spage><epage>498</epage><pages>498-498</pages><artnum>498</artnum><issn>1471-2164</issn><eissn>1471-2164</eissn><abstract>Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>27585926</pmid><doi>10.1186/s12864-016-2827-7</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2164
ispartof BMC genomics, 2016-08, Vol.17 Suppl 5 (Suppl 5), p.498-498, Article 498
issn 1471-2164
1471-2164
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5009557
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Springer Nature OA/Free Journals; SpringerLink Journals - AutoHoldings; PubMed Central Open Access
subjects Computational Biology
Genes, Plant
Genotype
Genotyping Techniques
High-Throughput Nucleotide Sequencing
Identification and classification
Manihot - genetics
Phaseolus - genetics
Population genetics
Sequence Analysis, DNA
Single nucleotide polymorphisms
title Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T13%3A26%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bioinformatic%20analysis%20of%20genotype%20by%20sequencing%20(GBS)%20data%20with%20NGSEP&rft.jtitle=BMC%20genomics&rft.au=Perea,%20Claudia&rft.date=2016-08-31&rft.volume=17%20Suppl%205&rft.issue=Suppl%205&rft.spage=498&rft.epage=498&rft.pages=498-498&rft.artnum=498&rft.issn=1471-2164&rft.eissn=1471-2164&rft_id=info:doi/10.1186/s12864-016-2827-7&rft_dat=%3Cgale_pubme%3EA464263897%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1816633400&rft_id=info:pmid/27585926&rft_galeid=A464263897&rfr_iscdi=true