HapCUT: an efficient and accurate algorithm for the haplotype assembly problem
Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time al...
Gespeichert in:
Veröffentlicht in: | Bioinformatics 2008-08, Vol.24 (16), p.i153-i159 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | i159 |
---|---|
container_issue | 16 |
container_start_page | i153 |
container_title | Bioinformatics |
container_volume | 24 |
creator | Bansal, Vikas Bafna, Vineet |
description | Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. Results: We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20–25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project. Availability: A program implementing HapCUT is available on request. Contact: vibansal@cs.ucsd.edu |
doi_str_mv | 10.1093/bioinformatics/btn298 |
format | Article |
fullrecord | <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_69404103</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btn298</oup_id><sourcerecordid>21196984</sourcerecordid><originalsourceid>FETCH-LOGICAL-c588t-73d928f0454003a7f3d7de2c1e91a4dfa07359788f9b0d790625c4dc0fc3b25e3</originalsourceid><addsrcrecordid>eNqNkc1u1DAUhS0EoqXwCKCIBbvQ6_ifHRoBg1oVIVqB2FiOYzMpSRxsR2LeHlcZFcGmXdnW_c45uj4IPcfwGoMip20f-smHOJrc23Ta5qlR8gE6xpRD3QBTD8udcFFTCeQIPUnpGoBhSuljdIQll0pieYwutmbeXF2-qcxUOe9727spl0dXGWuXaLKrzPAjxD7vxqrEVXnnqp2Zh5D3c5ml5MZ22FdzDO3gxqfokTdDcs8O5wm6ev_ucrOtzz99-Lh5e15bJmWuBelUIz1QRgGIEZ50onONxU5hQztvQBCmhJRetdAJBbxhlnYWvCVtwxw5Qa9W35L7a3Ep67FP1g2DmVxYkuaKAsVA7gQJJ6ShnN4JNhgrruQN-PI_8DoscSrbaqzKvxLAuEBshWwMKUXn9Rz70cS9xqBv-tP_9qfX_oruxcF8aUfX_VUdCisArEBY5nt71qukT9n9vhWZ-FNzQQTT22_fNZHk7Mv281ctyR__kbtj</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198683011</pqid></control><display><type>article</type><title>HapCUT: an efficient and accurate algorithm for the haplotype assembly problem</title><source>Oxford Journals Open Access Collection</source><creator>Bansal, Vikas ; Bafna, Vineet</creator><creatorcontrib>Bansal, Vikas ; Bafna, Vineet</creatorcontrib><description>Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. Results: We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20–25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project. Availability: A program implementing HapCUT is available on request. Contact: vibansal@cs.ucsd.edu</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btn298</identifier><identifier>PMID: 18689818</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Base Sequence ; Chromosome Mapping - methods ; Haplotypes - genetics ; Molecular Sequence Data ; Reproducibility of Results ; Sensitivity and Specificity ; Sequence Alignment - methods ; Sequence Analysis, DNA - methods</subject><ispartof>Bioinformatics, 2008-08, Vol.24 (16), p.i153-i159</ispartof><rights>The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2008</rights><rights>The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c588t-73d928f0454003a7f3d7de2c1e91a4dfa07359788f9b0d790625c4dc0fc3b25e3</citedby><cites>FETCH-LOGICAL-c588t-73d928f0454003a7f3d7de2c1e91a4dfa07359788f9b0d790625c4dc0fc3b25e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1604,27923,27924</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btn298$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18689818$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Bansal, Vikas</creatorcontrib><creatorcontrib>Bafna, Vineet</creatorcontrib><title>HapCUT: an efficient and accurate algorithm for the haplotype assembly problem</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. Results: We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20–25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project. Availability: A program implementing HapCUT is available on request. Contact: vibansal@cs.ucsd.edu</description><subject>Algorithms</subject><subject>Base Sequence</subject><subject>Chromosome Mapping - methods</subject><subject>Haplotypes - genetics</subject><subject>Molecular Sequence Data</subject><subject>Reproducibility of Results</subject><subject>Sensitivity and Specificity</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, DNA - methods</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkc1u1DAUhS0EoqXwCKCIBbvQ6_ifHRoBg1oVIVqB2FiOYzMpSRxsR2LeHlcZFcGmXdnW_c45uj4IPcfwGoMip20f-smHOJrc23Ta5qlR8gE6xpRD3QBTD8udcFFTCeQIPUnpGoBhSuljdIQll0pieYwutmbeXF2-qcxUOe9727spl0dXGWuXaLKrzPAjxD7vxqrEVXnnqp2Zh5D3c5ml5MZ22FdzDO3gxqfokTdDcs8O5wm6ev_ucrOtzz99-Lh5e15bJmWuBelUIz1QRgGIEZ50onONxU5hQztvQBCmhJRetdAJBbxhlnYWvCVtwxw5Qa9W35L7a3Ep67FP1g2DmVxYkuaKAsVA7gQJJ6ShnN4JNhgrruQN-PI_8DoscSrbaqzKvxLAuEBshWwMKUXn9Rz70cS9xqBv-tP_9qfX_oruxcF8aUfX_VUdCisArEBY5nt71qukT9n9vhWZ-FNzQQTT22_fNZHk7Mv281ctyR__kbtj</recordid><startdate>20080815</startdate><enddate>20080815</enddate><creator>Bansal, Vikas</creator><creator>Bafna, Vineet</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20080815</creationdate><title>HapCUT: an efficient and accurate algorithm for the haplotype assembly problem</title><author>Bansal, Vikas ; Bafna, Vineet</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c588t-73d928f0454003a7f3d7de2c1e91a4dfa07359788f9b0d790625c4dc0fc3b25e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Base Sequence</topic><topic>Chromosome Mapping - methods</topic><topic>Haplotypes - genetics</topic><topic>Molecular Sequence Data</topic><topic>Reproducibility of Results</topic><topic>Sensitivity and Specificity</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, DNA - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bansal, Vikas</creatorcontrib><creatorcontrib>Bafna, Vineet</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bansal, Vikas</au><au>Bafna, Vineet</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HapCUT: an efficient and accurate algorithm for the haplotype assembly problem</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2008-08-15</date><risdate>2008</risdate><volume>24</volume><issue>16</issue><spage>i153</spage><epage>i159</epage><pages>i153-i159</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. Results: We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20–25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project. Availability: A program implementing HapCUT is available on request. Contact: vibansal@cs.ucsd.edu</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>18689818</pmid><doi>10.1093/bioinformatics/btn298</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2008-08, Vol.24 (16), p.i153-i159 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_proquest_miscellaneous_69404103 |
source | Oxford Journals Open Access Collection |
subjects | Algorithms Base Sequence Chromosome Mapping - methods Haplotypes - genetics Molecular Sequence Data Reproducibility of Results Sensitivity and Specificity Sequence Alignment - methods Sequence Analysis, DNA - methods |
title | HapCUT: an efficient and accurate algorithm for the haplotype assembly problem |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T13%3A35%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HapCUT:%20an%20efficient%20and%20accurate%20algorithm%20for%20the%20haplotype%20assembly%20problem&rft.jtitle=Bioinformatics&rft.au=Bansal,%20Vikas&rft.date=2008-08-15&rft.volume=24&rft.issue=16&rft.spage=i153&rft.epage=i159&rft.pages=i153-i159&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btn298&rft_dat=%3Cproquest_TOX%3E21196984%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198683011&rft_id=info:pmid/18689818&rft_oup_id=10.1093/bioinformatics/btn298&rfr_iscdi=true |