Leveraging known genomic variants to improve detection of variants, especially close-by Indels

Abstract Motivation The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-genera...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2018-09, Vol.34 (17), p.2918-2926
Hauptverfasser: Vo, Nam S, Phan, Vinhthuy
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2926
container_issue 17
container_start_page 2918
container_title Bioinformatics
container_volume 34
creator Vo, Nam S
Phan, Vinhthuy
description Abstract Motivation The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately. Results We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost. Availability and implementation Implementation can be found in our public code repository https://github.com/namsyvo/IVC. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/bty183
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2019812500</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/bty183</oup_id><sourcerecordid>2019812500</sourcerecordid><originalsourceid>FETCH-LOGICAL-c397t-b486755d98fbadc192f3dde681b9898b5458faba83daf095fbdfc9d457e3a2303</originalsourceid><addsrcrecordid>eNqNkE1LAzEQhoMotlZ_gpKjB9cmm6RNjiJ-FApe9OqSj0mJ7iZ1s1vpv7elteDN08zhed8ZHoQuKbmlRLGxCSlEn9pGd8HmsenWVLIjNKR8QoqSCHW82dlkWnBJ2ACd5fxBiKCc81M0KJVQpFR8iN7nsIJWL0Jc4M-YviNeQExNsHil26Bjl3GXcGiWbVoBdtCB7UKKOPkDcIMhL8EGXddrbOuUoTBrPIsO6nyOTryuM1zs5wi9PT683j8X85en2f3dvLBMTbvCcDmZCuGU9EY7S1XpmXMwkdQoqaQRXEivjZbMaU-U8MZ5qxwXU2C6ZISN0PWud_PnVw-5q5qQLdS1jpD6XJWEKklLQbao2KG2TTm34KtlGxrdritKqq3a6q_aaqd2k7van-hNA-6Q-nW5AcgOSP3yn50_SomORg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2019812500</pqid></control><display><type>article</type><title>Leveraging known genomic variants to improve detection of variants, especially close-by Indels</title><source>Oxford Journals Open Access Collection</source><creator>Vo, Nam S ; Phan, Vinhthuy</creator><contributor>Berger, Bonnie</contributor><creatorcontrib>Vo, Nam S ; Phan, Vinhthuy ; Berger, Bonnie</creatorcontrib><description>Abstract Motivation The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately. Results We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost. Availability and implementation Implementation can be found in our public code repository https://github.com/namsyvo/IVC. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bty183</identifier><identifier>PMID: 29590294</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><ispartof>Bioinformatics, 2018-09, Vol.34 (17), p.2918-2926</ispartof><rights>The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c397t-b486755d98fbadc192f3dde681b9898b5458faba83daf095fbdfc9d457e3a2303</citedby><cites>FETCH-LOGICAL-c397t-b486755d98fbadc192f3dde681b9898b5458faba83daf095fbdfc9d457e3a2303</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1604,27924,27925</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/bty183$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29590294$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Berger, Bonnie</contributor><creatorcontrib>Vo, Nam S</creatorcontrib><creatorcontrib>Phan, Vinhthuy</creatorcontrib><title>Leveraging known genomic variants to improve detection of variants, especially close-by Indels</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately. Results We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost. Availability and implementation Implementation can be found in our public code repository https://github.com/namsyvo/IVC. Supplementary information Supplementary data are available at Bioinformatics online.</description><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNqNkE1LAzEQhoMotlZ_gpKjB9cmm6RNjiJ-FApe9OqSj0mJ7iZ1s1vpv7elteDN08zhed8ZHoQuKbmlRLGxCSlEn9pGd8HmsenWVLIjNKR8QoqSCHW82dlkWnBJ2ACd5fxBiKCc81M0KJVQpFR8iN7nsIJWL0Jc4M-YviNeQExNsHil26Bjl3GXcGiWbVoBdtCB7UKKOPkDcIMhL8EGXddrbOuUoTBrPIsO6nyOTryuM1zs5wi9PT683j8X85en2f3dvLBMTbvCcDmZCuGU9EY7S1XpmXMwkdQoqaQRXEivjZbMaU-U8MZ5qxwXU2C6ZISN0PWud_PnVw-5q5qQLdS1jpD6XJWEKklLQbao2KG2TTm34KtlGxrdritKqq3a6q_aaqd2k7van-hNA-6Q-nW5AcgOSP3yn50_SomORg</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Vo, Nam S</creator><creator>Phan, Vinhthuy</creator><general>Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20180901</creationdate><title>Leveraging known genomic variants to improve detection of variants, especially close-by Indels</title><author>Vo, Nam S ; Phan, Vinhthuy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c397t-b486755d98fbadc192f3dde681b9898b5458faba83daf095fbdfc9d457e3a2303</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vo, Nam S</creatorcontrib><creatorcontrib>Phan, Vinhthuy</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vo, Nam S</au><au>Phan, Vinhthuy</au><au>Berger, Bonnie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging known genomic variants to improve detection of variants, especially close-by Indels</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2018-09-01</date><risdate>2018</risdate><volume>34</volume><issue>17</issue><spage>2918</spage><epage>2926</epage><pages>2918-2926</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation The detection of genomic variants has great significance in genomics, bioinformatics, biomedical research and its applications. However, despite a lot of effort, Indels and structural variants are still under-characterized compared to SNPs. Current approaches based on next-generation sequencing data usually require large numbers of reads (high coverage) to be able to detect such types of variants accurately. However Indels, especially those close to each other, are still hard to detect accurately. Results We introduce a novel approach that leverages known variant information, e.g. provided by dbSNP, dbVar, ExAC or the 1000 Genomes Project, to improve sensitivity of detecting variants, especially close-by Indels. In our approach, the standard reference genome and the known variants are combined to build a meta-reference, which is expected to be probabilistically closer to the subject genomes than the standard reference. An alignment algorithm, which can take into account known variant information, is developed to accurately align reads to the meta-reference. This strategy resulted in accurate alignment and variant calling even with low coverage data. We showed that compared to popular methods such as GATK and SAMtools, our method significantly improves the sensitivity of detecting variants, especially Indels that are close to each other. In particular, our method was able to call these close-by Indels at a 15-20% higher sensitivity than other methods at low coverage, and still get 1-5% higher sensitivity at high coverage, at competitive precision. These results were validated using simulated data with variant profiles extracted from the 1000 Genomes Project data, and real data from the Illumina Platinum Genomes Project and ExAC database. Our finding suggests that by incorporating known variant information in an appropriate manner, sensitive variant calling is possible at a low cost. Availability and implementation Implementation can be found in our public code repository https://github.com/namsyvo/IVC. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>29590294</pmid><doi>10.1093/bioinformatics/bty183</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2018-09, Vol.34 (17), p.2918-2926
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_2019812500
source Oxford Journals Open Access Collection
title Leveraging known genomic variants to improve detection of variants, especially close-by Indels
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T00%3A08%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20known%20genomic%20variants%20to%20improve%20detection%20of%20variants,%20especially%20close-by%20Indels&rft.jtitle=Bioinformatics&rft.au=Vo,%20Nam%20S&rft.date=2018-09-01&rft.volume=34&rft.issue=17&rft.spage=2918&rft.epage=2926&rft.pages=2918-2926&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/bty183&rft_dat=%3Cproquest_TOX%3E2019812500%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2019812500&rft_id=info:pmid/29590294&rft_oup_id=10.1093/bioinformatics/bty183&rfr_iscdi=true