A recurrence-based approach for validating structural variation using long-read sequencing technology

Abstract Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Gigascience 2017-08, Vol.6 (8), p.1-9
Hauptverfasser: Zhao, Xuefang, Weber, Alexandra M., Mills, Ryan E.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 9
container_issue 8
container_start_page 1
container_title Gigascience
container_volume 6
creator Zhao, Xuefang
Weber, Alexandra M.
Mills, Ryan E.
description Abstract Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read–based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.
doi_str_mv 10.1093/gigascience/gix061
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5737365</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/gigascience/gix061</oup_id><sourcerecordid>2719060769</sourcerecordid><originalsourceid>FETCH-LOGICAL-c468t-69ce0b1c16cb935089237efc64a678b71072ca987a08919f95d3e6169b2038273</originalsourceid><addsrcrecordid>eNqNUU1LxDAQDaKoqH_AgxS8eKnmYzdpLoKIXyB4UfAW0nTajXSTmjSi_96UVVk9OZcZ5r15zOMhdEjwKcGSnXW209FYcAby_I452UC7FM9ESYl43lybd9BBjC84lxBVJdg22qFTl5zuIrgoApgUwiRU1jpCU-hhCF6bRdH6ULzp3jZ6tK4r4hiSGVPQfd4Gm5feFSlOUO9dVwbQTRHhNWWtaTmCWTjf--5jH221uo9w8NX30NP11ePlbXn_cHN3eXFfmhmvxpJLA7gmhnBTSzbHlaRMQGv4THNR1YJgQY2WldAZIrKV84YBJ1zWFLOKCraHzle6Q6qX0BhwY_5WDcEudfhQXlv1G3F2oTr_puaCCcbnWeDkSyD47COOammjgb7XDnyKikjGKSdsRjL1-A_1xafgsj1FBZGYY8FlZtEVywQfY4D25xmC1RSkWgtSrYLMR0frNn5OvmPLhHJF8Gn4j-AnhD-vQg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2719060769</pqid></control><display><type>article</type><title>A recurrence-based approach for validating structural variation using long-read sequencing technology</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><creator>Zhao, Xuefang ; Weber, Alexandra M. ; Mills, Ryan E.</creator><creatorcontrib>Zhao, Xuefang ; Weber, Alexandra M. ; Mills, Ryan E.</creatorcontrib><description>Abstract Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read–based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.</description><identifier>ISSN: 2047-217X</identifier><identifier>EISSN: 2047-217X</identifier><identifier>DOI: 10.1093/gigascience/gix061</identifier><identifier>PMID: 28873962</identifier><language>eng</language><publisher>United States: Oxford University Press</publisher><subject>Algorithms ; Computational Biology - methods ; Computer applications ; Computer Simulation ; DNA Copy Number Variations ; Genomes ; Genomic Structural Variation ; Genomics ; Genomics - methods ; Genotype ; Genotypes ; High-Throughput Nucleotide Sequencing - methods ; Inspection ; Reproducibility of Results ; ROC Curve ; Sequence Analysis, DNA ; Technical Note ; Vapors ; Variation</subject><ispartof>Gigascience, 2017-08, Vol.6 (8), p.1-9</ispartof><rights>The Authors 2017. Published by Oxford University Press. 2017</rights><rights>The Authors 2017. Published by Oxford University Press.</rights><rights>The Authors 2017. Published by Oxford University Press. 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c468t-69ce0b1c16cb935089237efc64a678b71072ca987a08919f95d3e6169b2038273</citedby><cites>FETCH-LOGICAL-c468t-69ce0b1c16cb935089237efc64a678b71072ca987a08919f95d3e6169b2038273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737365/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737365/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,1603,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28873962$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhao, Xuefang</creatorcontrib><creatorcontrib>Weber, Alexandra M.</creatorcontrib><creatorcontrib>Mills, Ryan E.</creatorcontrib><title>A recurrence-based approach for validating structural variation using long-read sequencing technology</title><title>Gigascience</title><addtitle>Gigascience</addtitle><description>Abstract Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read–based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.</description><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>Computer applications</subject><subject>Computer Simulation</subject><subject>DNA Copy Number Variations</subject><subject>Genomes</subject><subject>Genomic Structural Variation</subject><subject>Genomics</subject><subject>Genomics - methods</subject><subject>Genotype</subject><subject>Genotypes</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>Inspection</subject><subject>Reproducibility of Results</subject><subject>ROC Curve</subject><subject>Sequence Analysis, DNA</subject><subject>Technical Note</subject><subject>Vapors</subject><subject>Variation</subject><issn>2047-217X</issn><issn>2047-217X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNUU1LxDAQDaKoqH_AgxS8eKnmYzdpLoKIXyB4UfAW0nTajXSTmjSi_96UVVk9OZcZ5r15zOMhdEjwKcGSnXW209FYcAby_I452UC7FM9ESYl43lybd9BBjC84lxBVJdg22qFTl5zuIrgoApgUwiRU1jpCU-hhCF6bRdH6ULzp3jZ6tK4r4hiSGVPQfd4Gm5feFSlOUO9dVwbQTRHhNWWtaTmCWTjf--5jH221uo9w8NX30NP11ePlbXn_cHN3eXFfmhmvxpJLA7gmhnBTSzbHlaRMQGv4THNR1YJgQY2WldAZIrKV84YBJ1zWFLOKCraHzle6Q6qX0BhwY_5WDcEudfhQXlv1G3F2oTr_puaCCcbnWeDkSyD47COOammjgb7XDnyKikjGKSdsRjL1-A_1xafgsj1FBZGYY8FlZtEVywQfY4D25xmC1RSkWgtSrYLMR0frNn5OvmPLhHJF8Gn4j-AnhD-vQg</recordid><startdate>20170801</startdate><enddate>20170801</enddate><creator>Zhao, Xuefang</creator><creator>Weber, Alexandra M.</creator><creator>Mills, Ryan E.</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20170801</creationdate><title>A recurrence-based approach for validating structural variation using long-read sequencing technology</title><author>Zhao, Xuefang ; Weber, Alexandra M. ; Mills, Ryan E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c468t-69ce0b1c16cb935089237efc64a678b71072ca987a08919f95d3e6169b2038273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>Computer applications</topic><topic>Computer Simulation</topic><topic>DNA Copy Number Variations</topic><topic>Genomes</topic><topic>Genomic Structural Variation</topic><topic>Genomics</topic><topic>Genomics - methods</topic><topic>Genotype</topic><topic>Genotypes</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>Inspection</topic><topic>Reproducibility of Results</topic><topic>ROC Curve</topic><topic>Sequence Analysis, DNA</topic><topic>Technical Note</topic><topic>Vapors</topic><topic>Variation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Xuefang</creatorcontrib><creatorcontrib>Weber, Alexandra M.</creatorcontrib><creatorcontrib>Mills, Ryan E.</creatorcontrib><collection>Oxford Journals Open Access Collection</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Gigascience</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Xuefang</au><au>Weber, Alexandra M.</au><au>Mills, Ryan E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A recurrence-based approach for validating structural variation using long-read sequencing technology</atitle><jtitle>Gigascience</jtitle><addtitle>Gigascience</addtitle><date>2017-08-01</date><risdate>2017</risdate><volume>6</volume><issue>8</issue><spage>1</spage><epage>9</epage><pages>1-9</pages><issn>2047-217X</issn><eissn>2047-217X</eissn><abstract>Abstract Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read–based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.</abstract><cop>United States</cop><pub>Oxford University Press</pub><pmid>28873962</pmid><doi>10.1093/gigascience/gix061</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2047-217X
ispartof Gigascience, 2017-08, Vol.6 (8), p.1-9
issn 2047-217X
2047-217X
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5737365
source MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford Journals Open Access Collection; PubMed Central
subjects Algorithms
Computational Biology - methods
Computer applications
Computer Simulation
DNA Copy Number Variations
Genomes
Genomic Structural Variation
Genomics
Genomics - methods
Genotype
Genotypes
High-Throughput Nucleotide Sequencing - methods
Inspection
Reproducibility of Results
ROC Curve
Sequence Analysis, DNA
Technical Note
Vapors
Variation
title A recurrence-based approach for validating structural variation using long-read sequencing technology
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T04%3A47%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20recurrence-based%20approach%20for%20validating%20structural%20variation%20using%20long-read%20sequencing%20technology&rft.jtitle=Gigascience&rft.au=Zhao,%20Xuefang&rft.date=2017-08-01&rft.volume=6&rft.issue=8&rft.spage=1&rft.epage=9&rft.pages=1-9&rft.issn=2047-217X&rft.eissn=2047-217X&rft_id=info:doi/10.1093/gigascience/gix061&rft_dat=%3Cproquest_pubme%3E2719060769%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2719060769&rft_id=info:pmid/28873962&rft_oup_id=10.1093/gigascience/gix061&rfr_iscdi=true