Sequencing error correction without a reference genome
Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing...
Gespeichert in:
Veröffentlicht in: | BMC bioinformatics 2013-12, Vol.14 (1), p.367-367, Article 367 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 367 |
---|---|
container_issue | 1 |
container_start_page | 367 |
container_title | BMC bioinformatics |
container_volume | 14 |
creator | Sleep, Julie A Schreiber, Andreas W Baumann, Ute |
description | Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging.
We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered.
The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms. |
doi_str_mv | 10.1186/1471-2105-14-367 |
format | Article |
fullrecord | <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3879328</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A534518236</galeid><sourcerecordid>A534518236</sourcerecordid><originalsourceid>FETCH-LOGICAL-b618t-6b71c3fcf96eb3fe2f3fe5b86db543ebb6ec442d80ea6be0ba30473b6c00706e3</originalsourceid><addsrcrecordid>eNqNkl1rFTEQhoMotlbvvZIFb_Riaz42H3sjlEPVQkGweh2SnNltym5Sk121_94spx7OSgUJJMPMMy_Dm0HoJcGnhCjxjjSS1JRgXpOmZkI-Qsf71OOD-Ag9y_kGYyIV5k_REW0Yx1zhYySu4PsMwfnQV5BSTJWLKYGbfAzVTz9dx3mqTJWgg1QwqHoIcYTn6Elnhgwv7t8T9O3D-dfNp_ry88eLzdllbQVRUy2sJI51rmsFWNYB7crFrRJbyxsG1gpwTUO3CoMRFrA1DDeSWeEwllgAO0Hvd7q3sx1h6yBMyQz6NvnRpDsdjdfrSvDXuo8_NFOyZVQVgc1OwPr4D4F1xcVRL77pxbcS6WJrUXlzP0aKxa486dFnB8NgAsQ5F6ylgqm2_S8US6okpwV9_Rd6E-cUip_LBKwlVJADqjcDaB-6WOZ0i6g-46zhRFEmCnX6AFXOFkbvYoDOl_yq4e2qoTAT_Jp6M-esL66-rFm8Y12KOZdd2PtHsF4W8SHHXh1-3L7hz-ax36V01qA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1473912612</pqid></control><display><type>article</type><title>Sequencing error correction without a reference genome</title><source>MEDLINE</source><source>Springer Nature - Complete Springer Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><creator>Sleep, Julie A ; Schreiber, Andreas W ; Baumann, Ute</creator><creatorcontrib>Sleep, Julie A ; Schreiber, Andreas W ; Baumann, Ute</creatorcontrib><description>Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging.
We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered.
The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/1471-2105-14-367</identifier><identifier>PMID: 24350580</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Animals ; Base Sequence ; Chromosome Mapping ; Computer Simulation ; DNA sequencing ; Error-correcting codes ; Genome, Human ; High-Throughput Nucleotide Sequencing - methods ; High-Throughput Nucleotide Sequencing - standards ; Humans ; Methodology ; Methods ; Models, Genetic ; Nucleotide sequencing ; RNA sequencing ; Sequence Analysis, DNA ; Sequence Analysis, RNA ; Technology application</subject><ispartof>BMC bioinformatics, 2013-12, Vol.14 (1), p.367-367, Article 367</ispartof><rights>COPYRIGHT 2013 BioMed Central Ltd.</rights><rights>2013 Sleep et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>Copyright © 2013 Sleep et al.; licensee BioMed Central Ltd. 2013 Sleep et al.; licensee BioMed Central Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-b618t-6b71c3fcf96eb3fe2f3fe5b86db543ebb6ec442d80ea6be0ba30473b6c00706e3</citedby><cites>FETCH-LOGICAL-b618t-6b71c3fcf96eb3fe2f3fe5b86db543ebb6ec442d80ea6be0ba30473b6c00706e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879328/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3879328/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24350580$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Sleep, Julie A</creatorcontrib><creatorcontrib>Schreiber, Andreas W</creatorcontrib><creatorcontrib>Baumann, Ute</creatorcontrib><title>Sequencing error correction without a reference genome</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging.
We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered.
The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.</description><subject>Animals</subject><subject>Base Sequence</subject><subject>Chromosome Mapping</subject><subject>Computer Simulation</subject><subject>DNA sequencing</subject><subject>Error-correcting codes</subject><subject>Genome, Human</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>High-Throughput Nucleotide Sequencing - standards</subject><subject>Humans</subject><subject>Methodology</subject><subject>Methods</subject><subject>Models, Genetic</subject><subject>Nucleotide sequencing</subject><subject>RNA sequencing</subject><subject>Sequence Analysis, DNA</subject><subject>Sequence Analysis, RNA</subject><subject>Technology application</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><recordid>eNqNkl1rFTEQhoMotlbvvZIFb_Riaz42H3sjlEPVQkGweh2SnNltym5Sk121_94spx7OSgUJJMPMMy_Dm0HoJcGnhCjxjjSS1JRgXpOmZkI-Qsf71OOD-Ag9y_kGYyIV5k_REW0Yx1zhYySu4PsMwfnQV5BSTJWLKYGbfAzVTz9dx3mqTJWgg1QwqHoIcYTn6Elnhgwv7t8T9O3D-dfNp_ry88eLzdllbQVRUy2sJI51rmsFWNYB7crFrRJbyxsG1gpwTUO3CoMRFrA1DDeSWeEwllgAO0Hvd7q3sx1h6yBMyQz6NvnRpDsdjdfrSvDXuo8_NFOyZVQVgc1OwPr4D4F1xcVRL77pxbcS6WJrUXlzP0aKxa486dFnB8NgAsQ5F6ylgqm2_S8US6okpwV9_Rd6E-cUip_LBKwlVJADqjcDaB-6WOZ0i6g-46zhRFEmCnX6AFXOFkbvYoDOl_yq4e2qoTAT_Jp6M-esL66-rFm8Y12KOZdd2PtHsF4W8SHHXh1-3L7hz-ax36V01qA</recordid><startdate>20131218</startdate><enddate>20131218</enddate><creator>Sleep, Julie A</creator><creator>Schreiber, Andreas W</creator><creator>Baumann, Ute</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20131218</creationdate><title>Sequencing error correction without a reference genome</title><author>Sleep, Julie A ; Schreiber, Andreas W ; Baumann, Ute</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-b618t-6b71c3fcf96eb3fe2f3fe5b86db543ebb6ec442d80ea6be0ba30473b6c00706e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Animals</topic><topic>Base Sequence</topic><topic>Chromosome Mapping</topic><topic>Computer Simulation</topic><topic>DNA sequencing</topic><topic>Error-correcting codes</topic><topic>Genome, Human</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>High-Throughput Nucleotide Sequencing - standards</topic><topic>Humans</topic><topic>Methodology</topic><topic>Methods</topic><topic>Models, Genetic</topic><topic>Nucleotide sequencing</topic><topic>RNA sequencing</topic><topic>Sequence Analysis, DNA</topic><topic>Sequence Analysis, RNA</topic><topic>Technology application</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sleep, Julie A</creatorcontrib><creatorcontrib>Schreiber, Andreas W</creatorcontrib><creatorcontrib>Baumann, Ute</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sleep, Julie A</au><au>Schreiber, Andreas W</au><au>Baumann, Ute</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequencing error correction without a reference genome</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2013-12-18</date><risdate>2013</risdate><volume>14</volume><issue>1</issue><spage>367</spage><epage>367</epage><pages>367-367</pages><artnum>367</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging.
We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered.
The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>24350580</pmid><doi>10.1186/1471-2105-14-367</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1471-2105 |
ispartof | BMC bioinformatics, 2013-12, Vol.14 (1), p.367-367, Article 367 |
issn | 1471-2105 1471-2105 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3879328 |
source | MEDLINE; Springer Nature - Complete Springer Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; PubMed Central Open Access; Springer Nature OA Free Journals |
subjects | Animals Base Sequence Chromosome Mapping Computer Simulation DNA sequencing Error-correcting codes Genome, Human High-Throughput Nucleotide Sequencing - methods High-Throughput Nucleotide Sequencing - standards Humans Methodology Methods Models, Genetic Nucleotide sequencing RNA sequencing Sequence Analysis, DNA Sequence Analysis, RNA Technology application |
title | Sequencing error correction without a reference genome |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T16%3A20%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequencing%20error%20correction%20without%20a%20reference%20genome&rft.jtitle=BMC%20bioinformatics&rft.au=Sleep,%20Julie%20A&rft.date=2013-12-18&rft.volume=14&rft.issue=1&rft.spage=367&rft.epage=367&rft.pages=367-367&rft.artnum=367&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/1471-2105-14-367&rft_dat=%3Cgale_pubme%3EA534518236%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1473912612&rft_id=info:pmid/24350580&rft_galeid=A534518236&rfr_iscdi=true |