Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing
Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers et al . use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction...
Gespeichert in:
Veröffentlicht in: | Nature biotechnology 2012-01, Vol.30 (1), p.61-68 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 68 |
---|---|
container_issue | 1 |
container_start_page | 61 |
container_title | Nature biotechnology |
container_volume | 30 |
creator | Reumers, Joke De Rijk, Peter Zhao, Hui Liekens, Anthony Smeets, Dominiek Cleary, John Van Loo, Peter Van Den Bossche, Maarten Catthoor, Kirsten Sabbe, Bernard Despierre, Evelyn Vergote, Ignace Hilbush, Brian Lambrechts, Diether Del-Favero, Jurgen |
description | Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers
et al
. use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants.
Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs. |
doi_str_mv | 10.1038/nbt.2053 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_915379849</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A277436331</galeid><sourcerecordid>A277436331</sourcerecordid><originalsourceid>FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</originalsourceid><addsrcrecordid>eNqN0l2L1DAUBuAiiruugr9AAiIq2DFpmqS5XBY_FhYG_LotaXI6k6WTzuak4vrrTdnR2VlEpJDm4jkn5OQtiqeMLhjlzdvQpUVFBb9XHDNRy5JJLe_nPW1USZmQR8UjxEtKqaylfFgcVRVTjdb1cWGW2-Q3_ic40vshQfRhRSK4yQKStAYCMY6RRJOA-EAcJLBpNisI48Zb8t1Eb0JC0l0TXI8xlRGMIwhXEwSb5ePiQW8GhCe7_0nx9f27L2cfy4vlh_Oz04vSSqVTqYTmnMqOceV6YU0tgfe6aZQGpyRA1VleSUsrCs5xZ7TWXW2ts8oK2YienxQvb_pu45jPxtRuPFoYBhNgnLDVTHClm1pn-eqfklGmtGS15Jk-v0MvxymGfI9ZCdVUQtO9WpkBWh_6MUVj56btaaVUzXMnltXiLyp_DvIgxwD5AeCw4PVBQTYJfqSVmRDb88-f_t8uvx3aN7dsN6EPgHlBv1onvCk54Ltx2TgiRujbbfQbE6_zCNo5fm2OXzvHL9Nnu3FN3QbcH_g7bxm82AGD1gx9NDkjuHeiVpVkfH8d3M6JhHh77ncO_QV6IOyE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1015782590</pqid></control><display><type>article</type><title>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</title><source>MEDLINE</source><source>Springer Nature - Complete Springer Journals</source><source>Nature Journals Online</source><creator>Reumers, Joke ; De Rijk, Peter ; Zhao, Hui ; Liekens, Anthony ; Smeets, Dominiek ; Cleary, John ; Van Loo, Peter ; Van Den Bossche, Maarten ; Catthoor, Kirsten ; Sabbe, Bernard ; Despierre, Evelyn ; Vergote, Ignace ; Hilbush, Brian ; Lambrechts, Diether ; Del-Favero, Jurgen</creator><creatorcontrib>Reumers, Joke ; De Rijk, Peter ; Zhao, Hui ; Liekens, Anthony ; Smeets, Dominiek ; Cleary, John ; Van Loo, Peter ; Van Den Bossche, Maarten ; Catthoor, Kirsten ; Sabbe, Bernard ; Despierre, Evelyn ; Vergote, Ignace ; Hilbush, Brian ; Lambrechts, Diether ; Del-Favero, Jurgen</creatorcontrib><description>Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers
et al
. use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants.
Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.</description><identifier>ISSN: 1087-0156</identifier><identifier>EISSN: 1546-1696</identifier><identifier>DOI: 10.1038/nbt.2053</identifier><identifier>PMID: 22178994</identifier><identifier>CODEN: NABIF9</identifier><language>eng</language><publisher>New York: Nature Publishing Group US</publisher><subject>631/114 ; 631/208/726/649 ; 631/61/514 ; Agriculture ; analysis ; Bioinformatics ; Biological and medical sciences ; Biomedical and Life Sciences ; Biomedical Engineering/Biotechnology ; Biomedicine ; Biotechnology ; Diverse techniques ; DNA sequencing ; Female ; Filters ; Fundamental and applied biological sciences. Psychology ; Gene mutations ; Genetic aspects ; Genetic research ; Genome, Human - genetics ; Genomes ; Genomics ; HapMap Project ; Humans ; Life Sciences ; Male ; Molecular and cellular biology ; Mutation ; Neoplasms - genetics ; Nucleotide sequencing ; Polymorphism, Single Nucleotide - genetics ; Research Design ; Sequence Analysis, DNA - methods ; Software ; Tumors ; Twins ; Twins, Monozygotic - genetics</subject><ispartof>Nature biotechnology, 2012-01, Vol.30 (1), p.61-68</ispartof><rights>Springer Nature America, Inc. 2011</rights><rights>2015 INIST-CNRS</rights><rights>COPYRIGHT 2012 Nature Publishing Group</rights><rights>Copyright Nature Publishing Group Jan 2012</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</citedby><cites>FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1038/nbt.2053$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1038/nbt.2053$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,778,782,27907,27908,41471,42540,51302</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=25472613$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22178994$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Reumers, Joke</creatorcontrib><creatorcontrib>De Rijk, Peter</creatorcontrib><creatorcontrib>Zhao, Hui</creatorcontrib><creatorcontrib>Liekens, Anthony</creatorcontrib><creatorcontrib>Smeets, Dominiek</creatorcontrib><creatorcontrib>Cleary, John</creatorcontrib><creatorcontrib>Van Loo, Peter</creatorcontrib><creatorcontrib>Van Den Bossche, Maarten</creatorcontrib><creatorcontrib>Catthoor, Kirsten</creatorcontrib><creatorcontrib>Sabbe, Bernard</creatorcontrib><creatorcontrib>Despierre, Evelyn</creatorcontrib><creatorcontrib>Vergote, Ignace</creatorcontrib><creatorcontrib>Hilbush, Brian</creatorcontrib><creatorcontrib>Lambrechts, Diether</creatorcontrib><creatorcontrib>Del-Favero, Jurgen</creatorcontrib><title>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</title><title>Nature biotechnology</title><addtitle>Nat Biotechnol</addtitle><addtitle>Nat Biotechnol</addtitle><description>Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers
et al
. use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants.
Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.</description><subject>631/114</subject><subject>631/208/726/649</subject><subject>631/61/514</subject><subject>Agriculture</subject><subject>analysis</subject><subject>Bioinformatics</subject><subject>Biological and medical sciences</subject><subject>Biomedical and Life Sciences</subject><subject>Biomedical Engineering/Biotechnology</subject><subject>Biomedicine</subject><subject>Biotechnology</subject><subject>Diverse techniques</subject><subject>DNA sequencing</subject><subject>Female</subject><subject>Filters</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene mutations</subject><subject>Genetic aspects</subject><subject>Genetic research</subject><subject>Genome, Human - genetics</subject><subject>Genomes</subject><subject>Genomics</subject><subject>HapMap Project</subject><subject>Humans</subject><subject>Life Sciences</subject><subject>Male</subject><subject>Molecular and cellular biology</subject><subject>Mutation</subject><subject>Neoplasms - genetics</subject><subject>Nucleotide sequencing</subject><subject>Polymorphism, Single Nucleotide - genetics</subject><subject>Research Design</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Software</subject><subject>Tumors</subject><subject>Twins</subject><subject>Twins, Monozygotic - genetics</subject><issn>1087-0156</issn><issn>1546-1696</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>N95</sourceid><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqN0l2L1DAUBuAiiruugr9AAiIq2DFpmqS5XBY_FhYG_LotaXI6k6WTzuak4vrrTdnR2VlEpJDm4jkn5OQtiqeMLhjlzdvQpUVFBb9XHDNRy5JJLe_nPW1USZmQR8UjxEtKqaylfFgcVRVTjdb1cWGW2-Q3_ic40vshQfRhRSK4yQKStAYCMY6RRJOA-EAcJLBpNisI48Zb8t1Eb0JC0l0TXI8xlRGMIwhXEwSb5ePiQW8GhCe7_0nx9f27L2cfy4vlh_Oz04vSSqVTqYTmnMqOceV6YU0tgfe6aZQGpyRA1VleSUsrCs5xZ7TWXW2ts8oK2YienxQvb_pu45jPxtRuPFoYBhNgnLDVTHClm1pn-eqfklGmtGS15Jk-v0MvxymGfI9ZCdVUQtO9WpkBWh_6MUVj56btaaVUzXMnltXiLyp_DvIgxwD5AeCw4PVBQTYJfqSVmRDb88-f_t8uvx3aN7dsN6EPgHlBv1onvCk54Ltx2TgiRujbbfQbE6_zCNo5fm2OXzvHL9Nnu3FN3QbcH_g7bxm82AGD1gx9NDkjuHeiVpVkfH8d3M6JhHh77ncO_QV6IOyE</recordid><startdate>20120101</startdate><enddate>20120101</enddate><creator>Reumers, Joke</creator><creator>De Rijk, Peter</creator><creator>Zhao, Hui</creator><creator>Liekens, Anthony</creator><creator>Smeets, Dominiek</creator><creator>Cleary, John</creator><creator>Van Loo, Peter</creator><creator>Van Den Bossche, Maarten</creator><creator>Catthoor, Kirsten</creator><creator>Sabbe, Bernard</creator><creator>Despierre, Evelyn</creator><creator>Vergote, Ignace</creator><creator>Hilbush, Brian</creator><creator>Lambrechts, Diether</creator><creator>Del-Favero, Jurgen</creator><general>Nature Publishing Group US</general><general>Nature Publishing Group</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>XI7</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7T7</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M2P</scope><scope>M7P</scope><scope>M7S</scope><scope>MBDVC</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7QH</scope><scope>7UA</scope><scope>7X8</scope></search><sort><creationdate>20120101</creationdate><title>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</title><author>Reumers, Joke ; De Rijk, Peter ; Zhao, Hui ; Liekens, Anthony ; Smeets, Dominiek ; Cleary, John ; Van Loo, Peter ; Van Den Bossche, Maarten ; Catthoor, Kirsten ; Sabbe, Bernard ; Despierre, Evelyn ; Vergote, Ignace ; Hilbush, Brian ; Lambrechts, Diether ; Del-Favero, Jurgen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>631/114</topic><topic>631/208/726/649</topic><topic>631/61/514</topic><topic>Agriculture</topic><topic>analysis</topic><topic>Bioinformatics</topic><topic>Biological and medical sciences</topic><topic>Biomedical and Life Sciences</topic><topic>Biomedical Engineering/Biotechnology</topic><topic>Biomedicine</topic><topic>Biotechnology</topic><topic>Diverse techniques</topic><topic>DNA sequencing</topic><topic>Female</topic><topic>Filters</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene mutations</topic><topic>Genetic aspects</topic><topic>Genetic research</topic><topic>Genome, Human - genetics</topic><topic>Genomes</topic><topic>Genomics</topic><topic>HapMap Project</topic><topic>Humans</topic><topic>Life Sciences</topic><topic>Male</topic><topic>Molecular and cellular biology</topic><topic>Mutation</topic><topic>Neoplasms - genetics</topic><topic>Nucleotide sequencing</topic><topic>Polymorphism, Single Nucleotide - genetics</topic><topic>Research Design</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Software</topic><topic>Tumors</topic><topic>Twins</topic><topic>Twins, Monozygotic - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Reumers, Joke</creatorcontrib><creatorcontrib>De Rijk, Peter</creatorcontrib><creatorcontrib>Zhao, Hui</creatorcontrib><creatorcontrib>Liekens, Anthony</creatorcontrib><creatorcontrib>Smeets, Dominiek</creatorcontrib><creatorcontrib>Cleary, John</creatorcontrib><creatorcontrib>Van Loo, Peter</creatorcontrib><creatorcontrib>Van Den Bossche, Maarten</creatorcontrib><creatorcontrib>Catthoor, Kirsten</creatorcontrib><creatorcontrib>Sabbe, Bernard</creatorcontrib><creatorcontrib>Despierre, Evelyn</creatorcontrib><creatorcontrib>Vergote, Ignace</creatorcontrib><creatorcontrib>Hilbush, Brian</creatorcontrib><creatorcontrib>Lambrechts, Diether</creatorcontrib><creatorcontrib>Del-Favero, Jurgen</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>Business Insights: Essentials</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>Natural Science Collection (ProQuest)</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Science Database (ProQuest)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Research Library (Corporate)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>Aqualine</collection><collection>Water Resources Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Nature biotechnology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Reumers, Joke</au><au>De Rijk, Peter</au><au>Zhao, Hui</au><au>Liekens, Anthony</au><au>Smeets, Dominiek</au><au>Cleary, John</au><au>Van Loo, Peter</au><au>Van Den Bossche, Maarten</au><au>Catthoor, Kirsten</au><au>Sabbe, Bernard</au><au>Despierre, Evelyn</au><au>Vergote, Ignace</au><au>Hilbush, Brian</au><au>Lambrechts, Diether</au><au>Del-Favero, Jurgen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</atitle><jtitle>Nature biotechnology</jtitle><stitle>Nat Biotechnol</stitle><addtitle>Nat Biotechnol</addtitle><date>2012-01-01</date><risdate>2012</risdate><volume>30</volume><issue>1</issue><spage>61</spage><epage>68</epage><pages>61-68</pages><issn>1087-0156</issn><eissn>1546-1696</eissn><coden>NABIF9</coden><abstract>Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers
et al
. use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants.
Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.</abstract><cop>New York</cop><pub>Nature Publishing Group US</pub><pmid>22178994</pmid><doi>10.1038/nbt.2053</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1087-0156 |
ispartof | Nature biotechnology, 2012-01, Vol.30 (1), p.61-68 |
issn | 1087-0156 1546-1696 |
language | eng |
recordid | cdi_proquest_miscellaneous_915379849 |
source | MEDLINE; Springer Nature - Complete Springer Journals; Nature Journals Online |
subjects | 631/114 631/208/726/649 631/61/514 Agriculture analysis Bioinformatics Biological and medical sciences Biomedical and Life Sciences Biomedical Engineering/Biotechnology Biomedicine Biotechnology Diverse techniques DNA sequencing Female Filters Fundamental and applied biological sciences. Psychology Gene mutations Genetic aspects Genetic research Genome, Human - genetics Genomes Genomics HapMap Project Humans Life Sciences Male Molecular and cellular biology Mutation Neoplasms - genetics Nucleotide sequencing Polymorphism, Single Nucleotide - genetics Research Design Sequence Analysis, DNA - methods Software Tumors Twins Twins, Monozygotic - genetics |
title | Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T17%3A03%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimized%20filtering%20reduces%20the%20error%20rate%20in%20detecting%20genomic%20variants%20by%20short-read%20sequencing&rft.jtitle=Nature%20biotechnology&rft.au=Reumers,%20Joke&rft.date=2012-01-01&rft.volume=30&rft.issue=1&rft.spage=61&rft.epage=68&rft.pages=61-68&rft.issn=1087-0156&rft.eissn=1546-1696&rft.coden=NABIF9&rft_id=info:doi/10.1038/nbt.2053&rft_dat=%3Cgale_proqu%3EA277436331%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1015782590&rft_id=info:pmid/22178994&rft_galeid=A277436331&rfr_iscdi=true |