Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing

Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers et al . use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature biotechnology 2012-01, Vol.30 (1), p.61-68
Hauptverfasser: Reumers, Joke, De Rijk, Peter, Zhao, Hui, Liekens, Anthony, Smeets, Dominiek, Cleary, John, Van Loo, Peter, Van Den Bossche, Maarten, Catthoor, Kirsten, Sabbe, Bernard, Despierre, Evelyn, Vergote, Ignace, Hilbush, Brian, Lambrechts, Diether, Del-Favero, Jurgen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 68
container_issue 1
container_start_page 61
container_title Nature biotechnology
container_volume 30
creator Reumers, Joke
De Rijk, Peter
Zhao, Hui
Liekens, Anthony
Smeets, Dominiek
Cleary, John
Van Loo, Peter
Van Den Bossche, Maarten
Catthoor, Kirsten
Sabbe, Bernard
Despierre, Evelyn
Vergote, Ignace
Hilbush, Brian
Lambrechts, Diether
Del-Favero, Jurgen
description Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers et al . use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants. Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.
doi_str_mv 10.1038/nbt.2053
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_915379849</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A277436331</galeid><sourcerecordid>A277436331</sourcerecordid><originalsourceid>FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</originalsourceid><addsrcrecordid>eNqN0l2L1DAUBuAiiruugr9AAiIq2DFpmqS5XBY_FhYG_LotaXI6k6WTzuak4vrrTdnR2VlEpJDm4jkn5OQtiqeMLhjlzdvQpUVFBb9XHDNRy5JJLe_nPW1USZmQR8UjxEtKqaylfFgcVRVTjdb1cWGW2-Q3_ic40vshQfRhRSK4yQKStAYCMY6RRJOA-EAcJLBpNisI48Zb8t1Eb0JC0l0TXI8xlRGMIwhXEwSb5ePiQW8GhCe7_0nx9f27L2cfy4vlh_Oz04vSSqVTqYTmnMqOceV6YU0tgfe6aZQGpyRA1VleSUsrCs5xZ7TWXW2ts8oK2YienxQvb_pu45jPxtRuPFoYBhNgnLDVTHClm1pn-eqfklGmtGS15Jk-v0MvxymGfI9ZCdVUQtO9WpkBWh_6MUVj56btaaVUzXMnltXiLyp_DvIgxwD5AeCw4PVBQTYJfqSVmRDb88-f_t8uvx3aN7dsN6EPgHlBv1onvCk54Ltx2TgiRujbbfQbE6_zCNo5fm2OXzvHL9Nnu3FN3QbcH_g7bxm82AGD1gx9NDkjuHeiVpVkfH8d3M6JhHh77ncO_QV6IOyE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1015782590</pqid></control><display><type>article</type><title>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</title><source>MEDLINE</source><source>Springer Nature - Complete Springer Journals</source><source>Nature Journals Online</source><creator>Reumers, Joke ; De Rijk, Peter ; Zhao, Hui ; Liekens, Anthony ; Smeets, Dominiek ; Cleary, John ; Van Loo, Peter ; Van Den Bossche, Maarten ; Catthoor, Kirsten ; Sabbe, Bernard ; Despierre, Evelyn ; Vergote, Ignace ; Hilbush, Brian ; Lambrechts, Diether ; Del-Favero, Jurgen</creator><creatorcontrib>Reumers, Joke ; De Rijk, Peter ; Zhao, Hui ; Liekens, Anthony ; Smeets, Dominiek ; Cleary, John ; Van Loo, Peter ; Van Den Bossche, Maarten ; Catthoor, Kirsten ; Sabbe, Bernard ; Despierre, Evelyn ; Vergote, Ignace ; Hilbush, Brian ; Lambrechts, Diether ; Del-Favero, Jurgen</creatorcontrib><description>Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers et al . use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants. Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.</description><identifier>ISSN: 1087-0156</identifier><identifier>EISSN: 1546-1696</identifier><identifier>DOI: 10.1038/nbt.2053</identifier><identifier>PMID: 22178994</identifier><identifier>CODEN: NABIF9</identifier><language>eng</language><publisher>New York: Nature Publishing Group US</publisher><subject>631/114 ; 631/208/726/649 ; 631/61/514 ; Agriculture ; analysis ; Bioinformatics ; Biological and medical sciences ; Biomedical and Life Sciences ; Biomedical Engineering/Biotechnology ; Biomedicine ; Biotechnology ; Diverse techniques ; DNA sequencing ; Female ; Filters ; Fundamental and applied biological sciences. Psychology ; Gene mutations ; Genetic aspects ; Genetic research ; Genome, Human - genetics ; Genomes ; Genomics ; HapMap Project ; Humans ; Life Sciences ; Male ; Molecular and cellular biology ; Mutation ; Neoplasms - genetics ; Nucleotide sequencing ; Polymorphism, Single Nucleotide - genetics ; Research Design ; Sequence Analysis, DNA - methods ; Software ; Tumors ; Twins ; Twins, Monozygotic - genetics</subject><ispartof>Nature biotechnology, 2012-01, Vol.30 (1), p.61-68</ispartof><rights>Springer Nature America, Inc. 2011</rights><rights>2015 INIST-CNRS</rights><rights>COPYRIGHT 2012 Nature Publishing Group</rights><rights>Copyright Nature Publishing Group Jan 2012</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</citedby><cites>FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1038/nbt.2053$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1038/nbt.2053$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,778,782,27907,27908,41471,42540,51302</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=25472613$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22178994$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Reumers, Joke</creatorcontrib><creatorcontrib>De Rijk, Peter</creatorcontrib><creatorcontrib>Zhao, Hui</creatorcontrib><creatorcontrib>Liekens, Anthony</creatorcontrib><creatorcontrib>Smeets, Dominiek</creatorcontrib><creatorcontrib>Cleary, John</creatorcontrib><creatorcontrib>Van Loo, Peter</creatorcontrib><creatorcontrib>Van Den Bossche, Maarten</creatorcontrib><creatorcontrib>Catthoor, Kirsten</creatorcontrib><creatorcontrib>Sabbe, Bernard</creatorcontrib><creatorcontrib>Despierre, Evelyn</creatorcontrib><creatorcontrib>Vergote, Ignace</creatorcontrib><creatorcontrib>Hilbush, Brian</creatorcontrib><creatorcontrib>Lambrechts, Diether</creatorcontrib><creatorcontrib>Del-Favero, Jurgen</creatorcontrib><title>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</title><title>Nature biotechnology</title><addtitle>Nat Biotechnol</addtitle><addtitle>Nat Biotechnol</addtitle><description>Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers et al . use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants. Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.</description><subject>631/114</subject><subject>631/208/726/649</subject><subject>631/61/514</subject><subject>Agriculture</subject><subject>analysis</subject><subject>Bioinformatics</subject><subject>Biological and medical sciences</subject><subject>Biomedical and Life Sciences</subject><subject>Biomedical Engineering/Biotechnology</subject><subject>Biomedicine</subject><subject>Biotechnology</subject><subject>Diverse techniques</subject><subject>DNA sequencing</subject><subject>Female</subject><subject>Filters</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene mutations</subject><subject>Genetic aspects</subject><subject>Genetic research</subject><subject>Genome, Human - genetics</subject><subject>Genomes</subject><subject>Genomics</subject><subject>HapMap Project</subject><subject>Humans</subject><subject>Life Sciences</subject><subject>Male</subject><subject>Molecular and cellular biology</subject><subject>Mutation</subject><subject>Neoplasms - genetics</subject><subject>Nucleotide sequencing</subject><subject>Polymorphism, Single Nucleotide - genetics</subject><subject>Research Design</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Software</subject><subject>Tumors</subject><subject>Twins</subject><subject>Twins, Monozygotic - genetics</subject><issn>1087-0156</issn><issn>1546-1696</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>N95</sourceid><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqN0l2L1DAUBuAiiruugr9AAiIq2DFpmqS5XBY_FhYG_LotaXI6k6WTzuak4vrrTdnR2VlEpJDm4jkn5OQtiqeMLhjlzdvQpUVFBb9XHDNRy5JJLe_nPW1USZmQR8UjxEtKqaylfFgcVRVTjdb1cWGW2-Q3_ic40vshQfRhRSK4yQKStAYCMY6RRJOA-EAcJLBpNisI48Zb8t1Eb0JC0l0TXI8xlRGMIwhXEwSb5ePiQW8GhCe7_0nx9f27L2cfy4vlh_Oz04vSSqVTqYTmnMqOceV6YU0tgfe6aZQGpyRA1VleSUsrCs5xZ7TWXW2ts8oK2YienxQvb_pu45jPxtRuPFoYBhNgnLDVTHClm1pn-eqfklGmtGS15Jk-v0MvxymGfI9ZCdVUQtO9WpkBWh_6MUVj56btaaVUzXMnltXiLyp_DvIgxwD5AeCw4PVBQTYJfqSVmRDb88-f_t8uvx3aN7dsN6EPgHlBv1onvCk54Ltx2TgiRujbbfQbE6_zCNo5fm2OXzvHL9Nnu3FN3QbcH_g7bxm82AGD1gx9NDkjuHeiVpVkfH8d3M6JhHh77ncO_QV6IOyE</recordid><startdate>20120101</startdate><enddate>20120101</enddate><creator>Reumers, Joke</creator><creator>De Rijk, Peter</creator><creator>Zhao, Hui</creator><creator>Liekens, Anthony</creator><creator>Smeets, Dominiek</creator><creator>Cleary, John</creator><creator>Van Loo, Peter</creator><creator>Van Den Bossche, Maarten</creator><creator>Catthoor, Kirsten</creator><creator>Sabbe, Bernard</creator><creator>Despierre, Evelyn</creator><creator>Vergote, Ignace</creator><creator>Hilbush, Brian</creator><creator>Lambrechts, Diether</creator><creator>Del-Favero, Jurgen</creator><general>Nature Publishing Group US</general><general>Nature Publishing Group</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>N95</scope><scope>XI7</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7T7</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M2P</scope><scope>M7P</scope><scope>M7S</scope><scope>MBDVC</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7QH</scope><scope>7UA</scope><scope>7X8</scope></search><sort><creationdate>20120101</creationdate><title>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</title><author>Reumers, Joke ; De Rijk, Peter ; Zhao, Hui ; Liekens, Anthony ; Smeets, Dominiek ; Cleary, John ; Van Loo, Peter ; Van Den Bossche, Maarten ; Catthoor, Kirsten ; Sabbe, Bernard ; Despierre, Evelyn ; Vergote, Ignace ; Hilbush, Brian ; Lambrechts, Diether ; Del-Favero, Jurgen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c679t-7593306b137df5ca46e3f98879ed76ee2bc326c020edd3da999b4ccdc7c5685f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>631/114</topic><topic>631/208/726/649</topic><topic>631/61/514</topic><topic>Agriculture</topic><topic>analysis</topic><topic>Bioinformatics</topic><topic>Biological and medical sciences</topic><topic>Biomedical and Life Sciences</topic><topic>Biomedical Engineering/Biotechnology</topic><topic>Biomedicine</topic><topic>Biotechnology</topic><topic>Diverse techniques</topic><topic>DNA sequencing</topic><topic>Female</topic><topic>Filters</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene mutations</topic><topic>Genetic aspects</topic><topic>Genetic research</topic><topic>Genome, Human - genetics</topic><topic>Genomes</topic><topic>Genomics</topic><topic>HapMap Project</topic><topic>Humans</topic><topic>Life Sciences</topic><topic>Male</topic><topic>Molecular and cellular biology</topic><topic>Mutation</topic><topic>Neoplasms - genetics</topic><topic>Nucleotide sequencing</topic><topic>Polymorphism, Single Nucleotide - genetics</topic><topic>Research Design</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Software</topic><topic>Tumors</topic><topic>Twins</topic><topic>Twins, Monozygotic - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Reumers, Joke</creatorcontrib><creatorcontrib>De Rijk, Peter</creatorcontrib><creatorcontrib>Zhao, Hui</creatorcontrib><creatorcontrib>Liekens, Anthony</creatorcontrib><creatorcontrib>Smeets, Dominiek</creatorcontrib><creatorcontrib>Cleary, John</creatorcontrib><creatorcontrib>Van Loo, Peter</creatorcontrib><creatorcontrib>Van Den Bossche, Maarten</creatorcontrib><creatorcontrib>Catthoor, Kirsten</creatorcontrib><creatorcontrib>Sabbe, Bernard</creatorcontrib><creatorcontrib>Despierre, Evelyn</creatorcontrib><creatorcontrib>Vergote, Ignace</creatorcontrib><creatorcontrib>Hilbush, Brian</creatorcontrib><creatorcontrib>Lambrechts, Diether</creatorcontrib><creatorcontrib>Del-Favero, Jurgen</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale Business: Insights</collection><collection>Business Insights: Essentials</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>Natural Science Collection (ProQuest)</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Research Library</collection><collection>Science Database (ProQuest)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Research Library (Corporate)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>Aqualine</collection><collection>Water Resources Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Nature biotechnology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Reumers, Joke</au><au>De Rijk, Peter</au><au>Zhao, Hui</au><au>Liekens, Anthony</au><au>Smeets, Dominiek</au><au>Cleary, John</au><au>Van Loo, Peter</au><au>Van Den Bossche, Maarten</au><au>Catthoor, Kirsten</au><au>Sabbe, Bernard</au><au>Despierre, Evelyn</au><au>Vergote, Ignace</au><au>Hilbush, Brian</au><au>Lambrechts, Diether</au><au>Del-Favero, Jurgen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing</atitle><jtitle>Nature biotechnology</jtitle><stitle>Nat Biotechnol</stitle><addtitle>Nat Biotechnol</addtitle><date>2012-01-01</date><risdate>2012</risdate><volume>30</volume><issue>1</issue><spage>61</spage><epage>68</epage><pages>61-68</pages><issn>1087-0156</issn><eissn>1546-1696</eissn><coden>NABIF9</coden><abstract>Data filters separate true genetic variants in sequencing data from sequencing errors, but their effectiveness is difficult to assess. Reumers et al . use the genome sequences of monozygotic twins to evaluate the performance of filters individually and in combination, leading to a 290-fold reduction in error rate in calling single-nucleotide variants. Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs.</abstract><cop>New York</cop><pub>Nature Publishing Group US</pub><pmid>22178994</pmid><doi>10.1038/nbt.2053</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1087-0156
ispartof Nature biotechnology, 2012-01, Vol.30 (1), p.61-68
issn 1087-0156
1546-1696
language eng
recordid cdi_proquest_miscellaneous_915379849
source MEDLINE; Springer Nature - Complete Springer Journals; Nature Journals Online
subjects 631/114
631/208/726/649
631/61/514
Agriculture
analysis
Bioinformatics
Biological and medical sciences
Biomedical and Life Sciences
Biomedical Engineering/Biotechnology
Biomedicine
Biotechnology
Diverse techniques
DNA sequencing
Female
Filters
Fundamental and applied biological sciences. Psychology
Gene mutations
Genetic aspects
Genetic research
Genome, Human - genetics
Genomes
Genomics
HapMap Project
Humans
Life Sciences
Male
Molecular and cellular biology
Mutation
Neoplasms - genetics
Nucleotide sequencing
Polymorphism, Single Nucleotide - genetics
Research Design
Sequence Analysis, DNA - methods
Software
Tumors
Twins
Twins, Monozygotic - genetics
title Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T17%3A03%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimized%20filtering%20reduces%20the%20error%20rate%20in%20detecting%20genomic%20variants%20by%20short-read%20sequencing&rft.jtitle=Nature%20biotechnology&rft.au=Reumers,%20Joke&rft.date=2012-01-01&rft.volume=30&rft.issue=1&rft.spage=61&rft.epage=68&rft.pages=61-68&rft.issn=1087-0156&rft.eissn=1546-1696&rft.coden=NABIF9&rft_id=info:doi/10.1038/nbt.2053&rft_dat=%3Cgale_proqu%3EA277436331%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1015782590&rft_id=info:pmid/22178994&rft_galeid=A277436331&rfr_iscdi=true