A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing

The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2016-11, Vol.11 (11), p.e0166721-e0166721
Hauptverfasser: Ting, Chuan-Kang, Lin, Choun-Sea, Chan, Ming-Tsai, Chen, Jian-Wei, Chuang, Sheng-Yu, Huang, Yao-Ting
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e0166721
container_issue 11
container_start_page e0166721
container_title PloS one
container_volume 11
creator Ting, Chuan-Kang
Lin, Choun-Sea
Chan, Ming-Tsai
Chen, Jian-Wei
Chuang, Sheng-Yu
Huang, Yao-Ting
description The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.
doi_str_mv 10.1371/journal.pone.0166721
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1841403201</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A471881543</galeid><doaj_id>oai_doaj_org_article_e78629e5cf2148c0839bfc9133a6ec3c</doaj_id><sourcerecordid>A471881543</sourcerecordid><originalsourceid>FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</originalsourceid><addsrcrecordid>eNqNk8Fu1DAQhiMEoqXwBggiISE47OKxE8e5IK1KKStVKrSUq-V1xllXSby1EwRvj7ebVhvUQ5VDovE3_4z_ySTJayBzYAV8unaD71Qz37gO5wQ4Lyg8SQ6hZHTGKWFP974PkhchXBOSM8H58-SAFoJDzslh8mORnmKHvdXpoqmdt_26TY3z6Re7aZyttqeuxfQCtetC7wfdW9elV8F2dfpdWY_V7KSr0ku8GbDTMfoyeWZUE_DV-D5Krr6e_Dz-Njs7P10eL85muqB5PyuVyWiOQq2QssJQYYoV5MB1bLLIM8a1YYICZlBQAsKAAs10WTJdVdzklB0lb3e6sc8gRzeCBJFBRljMicRyR1ROXcuNt63yf6VTVt4GnK-l8vHmDUqMhtASc20oZEITwcqV0SUwpjjGulHr81htWLVYaex6r5qJ6PSks2tZu98yB8gFYVHgwyjgXbQq9LK1QWPTqA7dcNs3ZyVjsfQjUIiSQHhE3_2HPmzESNUq3tV2xsUW9VZULrIChIBoeKTmD1DxqbC1cfpobIxPEj5OEiLT45--VkMIcnl58Xj2_NeUfb_HrlE1_Tq4Ztj-eWEKZjtQexeCR3M_DyByuyR3bsjtkshxSWLam_1Z3ifdbQX7By86Ccs</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1841403201</pqid></control><display><type>article</type><title>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Ting, Chuan-Kang ; Lin, Choun-Sea ; Chan, Ming-Tsai ; Chen, Jian-Wei ; Chuang, Sheng-Yu ; Huang, Yao-Ting</creator><contributor>Xu, Peng</contributor><creatorcontrib>Ting, Chuan-Kang ; Lin, Choun-Sea ; Chan, Ming-Tsai ; Chen, Jian-Wei ; Chuang, Sheng-Yu ; Huang, Yao-Ting ; Xu, Peng</creatorcontrib><description>The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0166721</identifier><identifier>PMID: 27861560</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Agricultural biotechnology ; Algorithms ; Analysis ; Bioinformatics ; Biology and Life Sciences ; Biosphere ; Chromosomes ; Computational Biology - methods ; Computer science ; Computer Simulation ; Conserved sequence ; Diploidy ; Error correction &amp; detection ; Evolution, Molecular ; Gene sequencing ; Genetic algorithms ; Genetic aspects ; Genome ; Genomes ; Genomics ; Genomics - methods ; Haplotypes ; Heterozygosity ; Heterozygote ; High-Throughput Nucleotide Sequencing ; Linkage analysis ; Methods ; Mutation ; Nucleotide sequence ; Optimization ; Parenting ; Polymorphism, Single Nucleotide ; Problems ; Reproducibility of Results ; Research and analysis methods ; Researchers ; Sequence Analysis, DNA ; Single nucleotide polymorphisms ; Single-nucleotide polymorphism ; Software</subject><ispartof>PloS one, 2016-11, Vol.11 (11), p.e0166721-e0166721</ispartof><rights>COPYRIGHT 2016 Public Library of Science</rights><rights>2016 Ting et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2016 Ting et al 2016 Ting et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</citedby><cites>FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115803/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5115803/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79343,79344</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27861560$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Xu, Peng</contributor><creatorcontrib>Ting, Chuan-Kang</creatorcontrib><creatorcontrib>Lin, Choun-Sea</creatorcontrib><creatorcontrib>Chan, Ming-Tsai</creatorcontrib><creatorcontrib>Chen, Jian-Wei</creatorcontrib><creatorcontrib>Chuang, Sheng-Yu</creatorcontrib><creatorcontrib>Huang, Yao-Ting</creatorcontrib><title>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.</description><subject>Agricultural biotechnology</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Bioinformatics</subject><subject>Biology and Life Sciences</subject><subject>Biosphere</subject><subject>Chromosomes</subject><subject>Computational Biology - methods</subject><subject>Computer science</subject><subject>Computer Simulation</subject><subject>Conserved sequence</subject><subject>Diploidy</subject><subject>Error correction &amp; detection</subject><subject>Evolution, Molecular</subject><subject>Gene sequencing</subject><subject>Genetic algorithms</subject><subject>Genetic aspects</subject><subject>Genome</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Genomics - methods</subject><subject>Haplotypes</subject><subject>Heterozygosity</subject><subject>Heterozygote</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Linkage analysis</subject><subject>Methods</subject><subject>Mutation</subject><subject>Nucleotide sequence</subject><subject>Optimization</subject><subject>Parenting</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Problems</subject><subject>Reproducibility of Results</subject><subject>Research and analysis methods</subject><subject>Researchers</subject><subject>Sequence Analysis, DNA</subject><subject>Single nucleotide polymorphisms</subject><subject>Single-nucleotide polymorphism</subject><subject>Software</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqNk8Fu1DAQhiMEoqXwBggiISE47OKxE8e5IK1KKStVKrSUq-V1xllXSby1EwRvj7ebVhvUQ5VDovE3_4z_ySTJayBzYAV8unaD71Qz37gO5wQ4Lyg8SQ6hZHTGKWFP974PkhchXBOSM8H58-SAFoJDzslh8mORnmKHvdXpoqmdt_26TY3z6Re7aZyttqeuxfQCtetC7wfdW9elV8F2dfpdWY_V7KSr0ku8GbDTMfoyeWZUE_DV-D5Krr6e_Dz-Njs7P10eL85muqB5PyuVyWiOQq2QssJQYYoV5MB1bLLIM8a1YYICZlBQAsKAAs10WTJdVdzklB0lb3e6sc8gRzeCBJFBRljMicRyR1ROXcuNt63yf6VTVt4GnK-l8vHmDUqMhtASc20oZEITwcqV0SUwpjjGulHr81htWLVYaex6r5qJ6PSks2tZu98yB8gFYVHgwyjgXbQq9LK1QWPTqA7dcNs3ZyVjsfQjUIiSQHhE3_2HPmzESNUq3tV2xsUW9VZULrIChIBoeKTmD1DxqbC1cfpobIxPEj5OEiLT45--VkMIcnl58Xj2_NeUfb_HrlE1_Tq4Ztj-eWEKZjtQexeCR3M_DyByuyR3bsjtkshxSWLam_1Z3ifdbQX7By86Ccs</recordid><startdate>20161118</startdate><enddate>20161118</enddate><creator>Ting, Chuan-Kang</creator><creator>Lin, Choun-Sea</creator><creator>Chan, Ming-Tsai</creator><creator>Chen, Jian-Wei</creator><creator>Chuang, Sheng-Yu</creator><creator>Huang, Yao-Ting</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20161118</creationdate><title>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</title><author>Ting, Chuan-Kang ; Lin, Choun-Sea ; Chan, Ming-Tsai ; Chen, Jian-Wei ; Chuang, Sheng-Yu ; Huang, Yao-Ting</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c725t-9af425e8abe237f28f7b1516c05375436cf3821e4172018f1a1c3c993cdd6f523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Agricultural biotechnology</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Bioinformatics</topic><topic>Biology and Life Sciences</topic><topic>Biosphere</topic><topic>Chromosomes</topic><topic>Computational Biology - methods</topic><topic>Computer science</topic><topic>Computer Simulation</topic><topic>Conserved sequence</topic><topic>Diploidy</topic><topic>Error correction &amp; detection</topic><topic>Evolution, Molecular</topic><topic>Gene sequencing</topic><topic>Genetic algorithms</topic><topic>Genetic aspects</topic><topic>Genome</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Genomics - methods</topic><topic>Haplotypes</topic><topic>Heterozygosity</topic><topic>Heterozygote</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Linkage analysis</topic><topic>Methods</topic><topic>Mutation</topic><topic>Nucleotide sequence</topic><topic>Optimization</topic><topic>Parenting</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Problems</topic><topic>Reproducibility of Results</topic><topic>Research and analysis methods</topic><topic>Researchers</topic><topic>Sequence Analysis, DNA</topic><topic>Single nucleotide polymorphisms</topic><topic>Single-nucleotide polymorphism</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ting, Chuan-Kang</creatorcontrib><creatorcontrib>Lin, Choun-Sea</creatorcontrib><creatorcontrib>Chan, Ming-Tsai</creatorcontrib><creatorcontrib>Chen, Jian-Wei</creatorcontrib><creatorcontrib>Chuang, Sheng-Yu</creatorcontrib><creatorcontrib>Huang, Yao-Ting</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Proquest Nursing &amp; Allied Health Source</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ting, Chuan-Kang</au><au>Lin, Choun-Sea</au><au>Chan, Ming-Tsai</au><au>Chen, Jian-Wei</au><au>Chuang, Sheng-Yu</au><au>Huang, Yao-Ting</au><au>Xu, Peng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2016-11-18</date><risdate>2016</risdate><volume>11</volume><issue>11</issue><spage>e0166721</spage><epage>e0166721</epage><pages>e0166721-e0166721</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>27861560</pmid><doi>10.1371/journal.pone.0166721</doi><tpages>e0166721</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2016-11, Vol.11 (11), p.e0166721-e0166721
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_1841403201
source Public Library of Science (PLoS) Journals Open Access; MEDLINE; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Agricultural biotechnology
Algorithms
Analysis
Bioinformatics
Biology and Life Sciences
Biosphere
Chromosomes
Computational Biology - methods
Computer science
Computer Simulation
Conserved sequence
Diploidy
Error correction & detection
Evolution, Molecular
Gene sequencing
Genetic algorithms
Genetic aspects
Genome
Genomes
Genomics
Genomics - methods
Haplotypes
Heterozygosity
Heterozygote
High-Throughput Nucleotide Sequencing
Linkage analysis
Methods
Mutation
Nucleotide sequence
Optimization
Parenting
Polymorphism, Single Nucleotide
Problems
Reproducibility of Results
Research and analysis methods
Researchers
Sequence Analysis, DNA
Single nucleotide polymorphisms
Single-nucleotide polymorphism
Software
title A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T23%3A47%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Genetic%20Algorithm%20for%20Diploid%20Genome%20Reconstruction%20Using%20Paired-End%20Sequencing&rft.jtitle=PloS%20one&rft.au=Ting,%20Chuan-Kang&rft.date=2016-11-18&rft.volume=11&rft.issue=11&rft.spage=e0166721&rft.epage=e0166721&rft.pages=e0166721-e0166721&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0166721&rft_dat=%3Cgale_plos_%3EA471881543%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1841403201&rft_id=info:pmid/27861560&rft_galeid=A471881543&rft_doaj_id=oai_doaj_org_article_e78629e5cf2148c0839bfc9133a6ec3c&rfr_iscdi=true