Read clouds uncover variation in complex regions of the human genome

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genome research 2015-10, Vol.25 (10), p.1570-1580
Hauptverfasser: Bishara, Alex, Liu, Yuling, Weng, Ziming, Kashef-Haghighi, Dorna, Newburger, Daniel E, West, Robert, Sidow, Arend, Batzoglou, Serafim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1580
container_issue 10
container_start_page 1570
container_title Genome research
container_volume 25
creator Bishara, Alex
Liu, Yuling
Weng, Ziming
Kashef-Haghighi, Dorna
Newburger, Daniel E
West, Robert
Sidow, Arend
Batzoglou, Serafim
description Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.
doi_str_mv 10.1101/gr.191189.115
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4579342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1718914177</sourcerecordid><originalsourceid>FETCH-LOGICAL-c486t-f1db75029852f8458ebf80112bc071261d411857f5c7a268fbf8c14f1fc85bd43</originalsourceid><addsrcrecordid>eNqNkc1LxDAQxYMo7rp69Co5eumaaZM0vQjiNywIoueQpkm30jZr0i763xvZddGbp2Tm_fKYyUPoFMgcgMBF7edQAIgilmwPTYHRImGUF_vxToRICsJggo5CeCOEZFSIQzRJeSo4Y3SKbp6NqrBu3VgFPPbarY3Ha-UbNTSux02PtetWrfnA3tSxE7CzeFgavBw71ePa9K4zx-jAqjaYk-05Q693ty_XD8ni6f7x-mqRaCr4kFioypyRtBAstYIyYUorCEBaapJDyqGicRGWW6ZzlXJho6yBWrBasLKi2QxdbnxXY9mZSpt-8KqVK990yn9Kpxr5V-mbpazdWlKWFxlNo8H51sC799GEQXZN0KZtVW_cGCTkOeccOMv-gcY_BxpfRDTZoNq7ELyxu4mAyO-QZO3lJqRYssif_V5jR_-kkn0Bfz2NLQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1718914177</pqid></control><display><type>article</type><title>Read clouds uncover variation in complex regions of the human genome</title><source>MEDLINE</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Bishara, Alex ; Liu, Yuling ; Weng, Ziming ; Kashef-Haghighi, Dorna ; Newburger, Daniel E ; West, Robert ; Sidow, Arend ; Batzoglou, Serafim</creator><creatorcontrib>Bishara, Alex ; Liu, Yuling ; Weng, Ziming ; Kashef-Haghighi, Dorna ; Newburger, Daniel E ; West, Robert ; Sidow, Arend ; Batzoglou, Serafim</creatorcontrib><description>Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.</description><identifier>ISSN: 1088-9051</identifier><identifier>EISSN: 1549-5469</identifier><identifier>DOI: 10.1101/gr.191189.115</identifier><identifier>PMID: 26286554</identifier><language>eng</language><publisher>United States: Cold Spring Harbor Laboratory Press</publisher><subject>Algorithms ; Carcinoma, Ductal - genetics ; Carcinoma, Ductal, Breast - genetics ; DNA Fragmentation ; Genetic Variation ; Genome, Human ; Humans ; Method ; Sequence Alignment - methods ; Sequence Analysis, DNA - methods</subject><ispartof>Genome research, 2015-10, Vol.25 (10), p.1570-1580</ispartof><rights>2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.</rights><rights>2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c486t-f1db75029852f8458ebf80112bc071261d411857f5c7a268fbf8c14f1fc85bd43</citedby><cites>FETCH-LOGICAL-c486t-f1db75029852f8458ebf80112bc071261d411857f5c7a268fbf8c14f1fc85bd43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4579342/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4579342/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,886,27929,27930,53796,53798</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26286554$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Bishara, Alex</creatorcontrib><creatorcontrib>Liu, Yuling</creatorcontrib><creatorcontrib>Weng, Ziming</creatorcontrib><creatorcontrib>Kashef-Haghighi, Dorna</creatorcontrib><creatorcontrib>Newburger, Daniel E</creatorcontrib><creatorcontrib>West, Robert</creatorcontrib><creatorcontrib>Sidow, Arend</creatorcontrib><creatorcontrib>Batzoglou, Serafim</creatorcontrib><title>Read clouds uncover variation in complex regions of the human genome</title><title>Genome research</title><addtitle>Genome Res</addtitle><description>Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.</description><subject>Algorithms</subject><subject>Carcinoma, Ductal - genetics</subject><subject>Carcinoma, Ductal, Breast - genetics</subject><subject>DNA Fragmentation</subject><subject>Genetic Variation</subject><subject>Genome, Human</subject><subject>Humans</subject><subject>Method</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Analysis, DNA - methods</subject><issn>1088-9051</issn><issn>1549-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkc1LxDAQxYMo7rp69Co5eumaaZM0vQjiNywIoueQpkm30jZr0i763xvZddGbp2Tm_fKYyUPoFMgcgMBF7edQAIgilmwPTYHRImGUF_vxToRICsJggo5CeCOEZFSIQzRJeSo4Y3SKbp6NqrBu3VgFPPbarY3Ha-UbNTSux02PtetWrfnA3tSxE7CzeFgavBw71ePa9K4zx-jAqjaYk-05Q693ty_XD8ni6f7x-mqRaCr4kFioypyRtBAstYIyYUorCEBaapJDyqGicRGWW6ZzlXJho6yBWrBasLKi2QxdbnxXY9mZSpt-8KqVK990yn9Kpxr5V-mbpazdWlKWFxlNo8H51sC799GEQXZN0KZtVW_cGCTkOeccOMv-gcY_BxpfRDTZoNq7ELyxu4mAyO-QZO3lJqRYssif_V5jR_-kkn0Bfz2NLQ</recordid><startdate>20151001</startdate><enddate>20151001</enddate><creator>Bishara, Alex</creator><creator>Liu, Yuling</creator><creator>Weng, Ziming</creator><creator>Kashef-Haghighi, Dorna</creator><creator>Newburger, Daniel E</creator><creator>West, Robert</creator><creator>Sidow, Arend</creator><creator>Batzoglou, Serafim</creator><general>Cold Spring Harbor Laboratory Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope></search><sort><creationdate>20151001</creationdate><title>Read clouds uncover variation in complex regions of the human genome</title><author>Bishara, Alex ; Liu, Yuling ; Weng, Ziming ; Kashef-Haghighi, Dorna ; Newburger, Daniel E ; West, Robert ; Sidow, Arend ; Batzoglou, Serafim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c486t-f1db75029852f8458ebf80112bc071261d411857f5c7a268fbf8c14f1fc85bd43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Carcinoma, Ductal - genetics</topic><topic>Carcinoma, Ductal, Breast - genetics</topic><topic>DNA Fragmentation</topic><topic>Genetic Variation</topic><topic>Genome, Human</topic><topic>Humans</topic><topic>Method</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Analysis, DNA - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bishara, Alex</creatorcontrib><creatorcontrib>Liu, Yuling</creatorcontrib><creatorcontrib>Weng, Ziming</creatorcontrib><creatorcontrib>Kashef-Haghighi, Dorna</creatorcontrib><creatorcontrib>Newburger, Daniel E</creatorcontrib><creatorcontrib>West, Robert</creatorcontrib><creatorcontrib>Sidow, Arend</creatorcontrib><creatorcontrib>Batzoglou, Serafim</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bishara, Alex</au><au>Liu, Yuling</au><au>Weng, Ziming</au><au>Kashef-Haghighi, Dorna</au><au>Newburger, Daniel E</au><au>West, Robert</au><au>Sidow, Arend</au><au>Batzoglou, Serafim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Read clouds uncover variation in complex regions of the human genome</atitle><jtitle>Genome research</jtitle><addtitle>Genome Res</addtitle><date>2015-10-01</date><risdate>2015</risdate><volume>25</volume><issue>10</issue><spage>1570</spage><epage>1580</epage><pages>1570-1580</pages><issn>1088-9051</issn><eissn>1549-5469</eissn><abstract>Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies.</abstract><cop>United States</cop><pub>Cold Spring Harbor Laboratory Press</pub><pmid>26286554</pmid><doi>10.1101/gr.191189.115</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1088-9051
ispartof Genome research, 2015-10, Vol.25 (10), p.1570-1580
issn 1088-9051
1549-5469
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4579342
source MEDLINE; PubMed Central; Alma/SFX Local Collection
subjects Algorithms
Carcinoma, Ductal - genetics
Carcinoma, Ductal, Breast - genetics
DNA Fragmentation
Genetic Variation
Genome, Human
Humans
Method
Sequence Alignment - methods
Sequence Analysis, DNA - methods
title Read clouds uncover variation in complex regions of the human genome
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-11T12%3A53%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Read%20clouds%20uncover%20variation%20in%20complex%20regions%20of%20the%20human%20genome&rft.jtitle=Genome%20research&rft.au=Bishara,%20Alex&rft.date=2015-10-01&rft.volume=25&rft.issue=10&rft.spage=1570&rft.epage=1580&rft.pages=1570-1580&rft.issn=1088-9051&rft.eissn=1549-5469&rft_id=info:doi/10.1101/gr.191189.115&rft_dat=%3Cproquest_pubme%3E1718914177%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1718914177&rft_id=info:pmid/26286554&rfr_iscdi=true