A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencin...
Gespeichert in:
Veröffentlicht in: | PloS one 2016-11, Vol.11 (11), p.e0166721-e0166721 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e0166721 |
---|---|
container_issue | 11 |
container_start_page | e0166721 |
container_title | PloS one |
container_volume | 11 |
creator | Ting, Chuan-Kang Lin, Choun-Sea Chan, Ming-Tsai Chen, Jian-Wei Chuang, Sheng-Yu Huang, Yao-Ting |
description | The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated. |
doi_str_mv | 10.1371/journal.pone.0166721 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1841403201</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A471881543</galeid><doaj_id>oai_doaj_org_article_e78629e5cf2148c0839bfc9133a6ec3c</doaj_id><sourcerecordid>A471881543</sourcerecordid><originalsourceid>FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</originalsourceid><addsrcrecordid>eNqNk8Fu1DAQhiMEoqXwBggiISE47OKxE8e5IK1KKStVKrSUq-V1xllXSby1EwRvj7ebVhvUQ5VDovE3_4z_ySTJayBzYAV8unaD71Qz37gO5wQ4Lyg8SQ6hZHTGKWFP974PkhchXBOSM8H58-SAFoJDzslh8mORnmKHvdXpoqmdt_26TY3z6Re7aZyttqeuxfQCtetC7wfdW9elV8F2dfpdWY_V7KSr0ku8GbDTMfoyeWZUE_DV-D5Krr6e_Dz-Njs7P10eL85muqB5PyuVyWiOQq2QssJQYYoV5MB1bLLIM8a1YYICZlBQAsKAAs10WTJdVdzklB0lb3e6sc8gRzeCBJFBRljMicRyR1ROXcuNt63yf6VTVt4GnK-l8vHmDUqMhtASc20oZEITwcqV0SUwpjjGulHr81htWLVYaex6r5qJ6PSks2tZu98yB8gFYVHgwyjgXbQq9LK1QWPTqA7dcNs3ZyVjsfQjUIiSQHhE3_2HPmzESNUq3tV2xsUW9VZULrIChIBoeKTmD1DxqbC1cfpobIxPEj5OEiLT45--VkMIcnl58Xj2_NeUfb_HrlE1_Tq4Ztj-eWEKZjtQexeCR3M_DyByuyR3bsjtkshxSWLam_1Z3ifdbQX7By86Ccs</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1841403201</pqid></control><display><type>article</type><title>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Ting, Chuan-Kang ; Lin, Choun-Sea ; Chan, Ming-Tsai ; Chen, Jian-Wei ; Chuang, Sheng-Yu ; Huang, Yao-Ting</creator><contributor>Xu, Peng</contributor><creatorcontrib>Ting, Chuan-Kang ; Lin, Choun-Sea ; Chan, Ming-Tsai ; Chen, Jian-Wei ; Chuang, Sheng-Yu ; Huang, Yao-Ting ; Xu, Peng</creatorcontrib><description>The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0166721</identifier><identifier>PMID: 27861560</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Agricultural biotechnology ; Algorithms ; Analysis ; Bioinformatics ; Biology and Life Sciences ; Biosphere ; Chromosomes ; Computational Biology - methods ; Computer science ; Computer Simulation ; Conserved sequence ; Diploidy ; Error correction & detection ; Evolution, Molecular ; Gene sequencing ; Genetic algorithms ; Genetic aspects ; Genome ; Genomes ; Genomics ; Genomics - methods ; Haplotypes ; Heterozygosity ; Heterozygote ; High-Throughput Nucleotide Sequencing ; Linkage analysis ; Methods ; Mutation ; Nucleotide sequence ; Optimization ; Parenting ; Polymorphism, Single Nucleotide ; Problems ; Reproducibility of Results ; Research and analysis methods ; Researchers ; Sequence Analysis, DNA ; Single nucleotide polymorphisms ; Single-nucleotide polymorphism ; Software</subject><ispartof>PloS one, 2016-11, Vol.11 (11), p.e0166721-e0166721</ispartof><rights>COPYRIGHT 2016 Public Library of Science</rights><rights>2016 Ting et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2016 Ting et al 2016 Ting et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</citedby><cites>FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115803/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115803/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79343,79344</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27861560$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Xu, Peng</contributor><creatorcontrib>Ting, Chuan-Kang</creatorcontrib><creatorcontrib>Lin, Choun-Sea</creatorcontrib><creatorcontrib>Chan, Ming-Tsai</creatorcontrib><creatorcontrib>Chen, Jian-Wei</creatorcontrib><creatorcontrib>Chuang, Sheng-Yu</creatorcontrib><creatorcontrib>Huang, Yao-Ting</creatorcontrib><title>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.</description><subject>Agricultural biotechnology</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Bioinformatics</subject><subject>Biology and Life Sciences</subject><subject>Biosphere</subject><subject>Chromosomes</subject><subject>Computational Biology - methods</subject><subject>Computer science</subject><subject>Computer Simulation</subject><subject>Conserved sequence</subject><subject>Diploidy</subject><subject>Error correction & detection</subject><subject>Evolution, Molecular</subject><subject>Gene sequencing</subject><subject>Genetic algorithms</subject><subject>Genetic aspects</subject><subject>Genome</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Genomics - methods</subject><subject>Haplotypes</subject><subject>Heterozygosity</subject><subject>Heterozygote</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Linkage analysis</subject><subject>Methods</subject><subject>Mutation</subject><subject>Nucleotide sequence</subject><subject>Optimization</subject><subject>Parenting</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Problems</subject><subject>Reproducibility of Results</subject><subject>Research and analysis methods</subject><subject>Researchers</subject><subject>Sequence Analysis, DNA</subject><subject>Single nucleotide polymorphisms</subject><subject>Single-nucleotide polymorphism</subject><subject>Software</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqNk8Fu1DAQhiMEoqXwBggiISE47OKxE8e5IK1KKStVKrSUq-V1xllXSby1EwRvj7ebVhvUQ5VDovE3_4z_ySTJayBzYAV8unaD71Qz37gO5wQ4Lyg8SQ6hZHTGKWFP974PkhchXBOSM8H58-SAFoJDzslh8mORnmKHvdXpoqmdt_26TY3z6Re7aZyttqeuxfQCtetC7wfdW9elV8F2dfpdWY_V7KSr0ku8GbDTMfoyeWZUE_DV-D5Krr6e_Dz-Njs7P10eL85muqB5PyuVyWiOQq2QssJQYYoV5MB1bLLIM8a1YYICZlBQAsKAAs10WTJdVdzklB0lb3e6sc8gRzeCBJFBRljMicRyR1ROXcuNt63yf6VTVt4GnK-l8vHmDUqMhtASc20oZEITwcqV0SUwpjjGulHr81htWLVYaex6r5qJ6PSks2tZu98yB8gFYVHgwyjgXbQq9LK1QWPTqA7dcNs3ZyVjsfQjUIiSQHhE3_2HPmzESNUq3tV2xsUW9VZULrIChIBoeKTmD1DxqbC1cfpobIxPEj5OEiLT45--VkMIcnl58Xj2_NeUfb_HrlE1_Tq4Ztj-eWEKZjtQexeCR3M_DyByuyR3bsjtkshxSWLam_1Z3ifdbQX7By86Ccs</recordid><startdate>20161118</startdate><enddate>20161118</enddate><creator>Ting, Chuan-Kang</creator><creator>Lin, Choun-Sea</creator><creator>Chan, Ming-Tsai</creator><creator>Chen, Jian-Wei</creator><creator>Chuang, Sheng-Yu</creator><creator>Huang, Yao-Ting</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20161118</creationdate><title>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</title><author>Ting, Chuan-Kang ; Lin, Choun-Sea ; Chan, Ming-Tsai ; Chen, Jian-Wei ; Chuang, Sheng-Yu ; Huang, Yao-Ting</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Agricultural biotechnology</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Bioinformatics</topic><topic>Biology and Life Sciences</topic><topic>Biosphere</topic><topic>Chromosomes</topic><topic>Computational Biology - methods</topic><topic>Computer science</topic><topic>Computer Simulation</topic><topic>Conserved sequence</topic><topic>Diploidy</topic><topic>Error correction & detection</topic><topic>Evolution, Molecular</topic><topic>Gene sequencing</topic><topic>Genetic algorithms</topic><topic>Genetic aspects</topic><topic>Genome</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Genomics - methods</topic><topic>Haplotypes</topic><topic>Heterozygosity</topic><topic>Heterozygote</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Linkage analysis</topic><topic>Methods</topic><topic>Mutation</topic><topic>Nucleotide sequence</topic><topic>Optimization</topic><topic>Parenting</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Problems</topic><topic>Reproducibility of Results</topic><topic>Research and analysis methods</topic><topic>Researchers</topic><topic>Sequence Analysis, DNA</topic><topic>Single nucleotide polymorphisms</topic><topic>Single-nucleotide polymorphism</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ting, Chuan-Kang</creatorcontrib><creatorcontrib>Lin, Choun-Sea</creatorcontrib><creatorcontrib>Chan, Ming-Tsai</creatorcontrib><creatorcontrib>Chen, Jian-Wei</creatorcontrib><creatorcontrib>Chuang, Sheng-Yu</creatorcontrib><creatorcontrib>Huang, Yao-Ting</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Proquest Nursing & Allied Health Source</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ting, Chuan-Kang</au><au>Lin, Choun-Sea</au><au>Chan, Ming-Tsai</au><au>Chen, Jian-Wei</au><au>Chuang, Sheng-Yu</au><au>Huang, Yao-Ting</au><au>Xu, Peng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2016-11-18</date><risdate>2016</risdate><volume>11</volume><issue>11</issue><spage>e0166721</spage><epage>e0166721</epage><pages>e0166721-e0166721</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>27861560</pmid><doi>10.1371/journal.pone.0166721</doi><tpages>e0166721</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2016-11, Vol.11 (11), p.e0166721-e0166721 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_1841403201 |
source | Public Library of Science (PLoS) Journals Open Access; MEDLINE; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Agricultural biotechnology Algorithms Analysis Bioinformatics Biology and Life Sciences Biosphere Chromosomes Computational Biology - methods Computer science Computer Simulation Conserved sequence Diploidy Error correction & detection Evolution, Molecular Gene sequencing Genetic algorithms Genetic aspects Genome Genomes Genomics Genomics - methods Haplotypes Heterozygosity Heterozygote High-Throughput Nucleotide Sequencing Linkage analysis Methods Mutation Nucleotide sequence Optimization Parenting Polymorphism, Single Nucleotide Problems Reproducibility of Results Research and analysis methods Researchers Sequence Analysis, DNA Single nucleotide polymorphisms Single-nucleotide polymorphism Software |
title | A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A47%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Genetic%20Algorithm%20for%20Diploid%20Genome%20Reconstruction%20Using%20Paired-End%20Sequencing&rft.jtitle=PloS%20one&rft.au=Ting,%20Chuan-Kang&rft.date=2016-11-18&rft.volume=11&rft.issue=11&rft.spage=e0166721&rft.epage=e0166721&rft.pages=e0166721-e0166721&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0166721&rft_dat=%3Cgale_plos_%3EA471881543%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1841403201&rft_id=info:pmid/27861560&rft_galeid=A471881543&rft_doaj_id=oai_doaj_org_article_e78629e5cf2148c0839bfc9133a6ec3c&rfr_iscdi=true |