MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment

•We present an error correction method based on grouping reads and Multiple Sequence Alignment (MSA).•This method supports any sequencing technology because it handles all types of errors, including indels.•PacBio datasets can be corrected without an existing genome or a helper dataset with less err...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2016-02, Vol.329, p.206-219
Hauptverfasser: Alic, Andy S., Tomas, Andres, Medina, Ignacio, Blanquer, Ignacio
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 219
container_issue
container_start_page 206
container_title Information sciences
container_volume 329
creator Alic, Andy S.
Tomas, Andres
Medina, Ignacio
Blanquer, Ignacio
description •We present an error correction method based on grouping reads and Multiple Sequence Alignment (MSA).•This method supports any sequencing technology because it handles all types of errors, including indels.•PacBio datasets can be corrected without an existing genome or a helper dataset with less errors.•Our method is faster and uses less memory than the state of the art.•MuffinEc obtains better sensitivity, specificity and gain in most of our experiments. Error correction is typically the first step of de Novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at http://muffinec.sourceforge.net.
doi_str_mv 10.1016/j.ins.2015.09.012
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1786155112</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0020025515006660</els_id><sourcerecordid>1786155112</sourcerecordid><originalsourceid>FETCH-LOGICAL-c330t-3449e88a91ec4b6c7218de82cda5151bc74debf3b6bbf6eccebc073cb274ade93</originalsourceid><addsrcrecordid>eNp9kMtu2zAQRYmgAeI6-YDsuOxGygz1TleF4TYBknaTbEvwMTJoSKRLygb895XirrsaDHDPYO5h7B4hR8D6YZ87n3IBWOXQ5YDiiq2wbURWiw4_sRWAgAxEVd2wzyntAaBs6nrFfr8e-975rXnk2xhD5CbESGZywfN-Xi3xn-EUuEqJRj2c-ckpvotE9swPKk5uSTq_48pbnujPkbwhrga38yP56ZZd92pIdPdvrtn79-3b5il7-fXjefPtJTNFAVNWlGVHbas6JFPq2jQCW0utMFZVWKE2TWlJ94Wute5rMoa0gaYwWjSlstQVa_blcvcQw_xDmuTokqFhUJ7CMUls2hqrClHMUbxETQwpRerlIbpRxbNEkItLuZezS7m4lNBJ-GC-XhiaO5wcRZmMW5pat8iSNrj_0H8Bl-V-xA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1786155112</pqid></control><display><type>article</type><title>MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment</title><source>ScienceDirect Journals (5 years ago - present)</source><creator>Alic, Andy S. ; Tomas, Andres ; Medina, Ignacio ; Blanquer, Ignacio</creator><creatorcontrib>Alic, Andy S. ; Tomas, Andres ; Medina, Ignacio ; Blanquer, Ignacio</creatorcontrib><description>•We present an error correction method based on grouping reads and Multiple Sequence Alignment (MSA).•This method supports any sequencing technology because it handles all types of errors, including indels.•PacBio datasets can be corrected without an existing genome or a helper dataset with less errors.•Our method is faster and uses less memory than the state of the art.•MuffinEc obtains better sensitivity, specificity and gain in most of our experiments. Error correction is typically the first step of de Novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at http://muffinec.sourceforge.net.</description><identifier>ISSN: 0020-0255</identifier><identifier>EISSN: 1872-6291</identifier><identifier>DOI: 10.1016/j.ins.2015.09.012</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Assembly ; Computation ; Consumption ; De novo ; Deletion ; Error correction ; Genomes ; Genomic error correction ; Insertion ; Mathematical models ; Multiple sequence alignment ; Next generation sequencing</subject><ispartof>Information sciences, 2016-02, Vol.329, p.206-219</ispartof><rights>2015 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c330t-3449e88a91ec4b6c7218de82cda5151bc74debf3b6bbf6eccebc073cb274ade93</citedby><cites>FETCH-LOGICAL-c330t-3449e88a91ec4b6c7218de82cda5151bc74debf3b6bbf6eccebc073cb274ade93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0020025515006660$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Alic, Andy S.</creatorcontrib><creatorcontrib>Tomas, Andres</creatorcontrib><creatorcontrib>Medina, Ignacio</creatorcontrib><creatorcontrib>Blanquer, Ignacio</creatorcontrib><title>MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment</title><title>Information sciences</title><description>•We present an error correction method based on grouping reads and Multiple Sequence Alignment (MSA).•This method supports any sequencing technology because it handles all types of errors, including indels.•PacBio datasets can be corrected without an existing genome or a helper dataset with less errors.•Our method is faster and uses less memory than the state of the art.•MuffinEc obtains better sensitivity, specificity and gain in most of our experiments. Error correction is typically the first step of de Novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at http://muffinec.sourceforge.net.</description><subject>Assembly</subject><subject>Computation</subject><subject>Consumption</subject><subject>De novo</subject><subject>Deletion</subject><subject>Error correction</subject><subject>Genomes</subject><subject>Genomic error correction</subject><subject>Insertion</subject><subject>Mathematical models</subject><subject>Multiple sequence alignment</subject><subject>Next generation sequencing</subject><issn>0020-0255</issn><issn>1872-6291</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kMtu2zAQRYmgAeI6-YDsuOxGygz1TleF4TYBknaTbEvwMTJoSKRLygb895XirrsaDHDPYO5h7B4hR8D6YZ87n3IBWOXQ5YDiiq2wbURWiw4_sRWAgAxEVd2wzyntAaBs6nrFfr8e-975rXnk2xhD5CbESGZywfN-Xi3xn-EUuEqJRj2c-ckpvotE9swPKk5uSTq_48pbnujPkbwhrga38yP56ZZd92pIdPdvrtn79-3b5il7-fXjefPtJTNFAVNWlGVHbas6JFPq2jQCW0utMFZVWKE2TWlJ94Wute5rMoa0gaYwWjSlstQVa_blcvcQw_xDmuTokqFhUJ7CMUls2hqrClHMUbxETQwpRerlIbpRxbNEkItLuZezS7m4lNBJ-GC-XhiaO5wcRZmMW5pat8iSNrj_0H8Bl-V-xA</recordid><startdate>20160201</startdate><enddate>20160201</enddate><creator>Alic, Andy S.</creator><creator>Tomas, Andres</creator><creator>Medina, Ignacio</creator><creator>Blanquer, Ignacio</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160201</creationdate><title>MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment</title><author>Alic, Andy S. ; Tomas, Andres ; Medina, Ignacio ; Blanquer, Ignacio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c330t-3449e88a91ec4b6c7218de82cda5151bc74debf3b6bbf6eccebc073cb274ade93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Assembly</topic><topic>Computation</topic><topic>Consumption</topic><topic>De novo</topic><topic>Deletion</topic><topic>Error correction</topic><topic>Genomes</topic><topic>Genomic error correction</topic><topic>Insertion</topic><topic>Mathematical models</topic><topic>Multiple sequence alignment</topic><topic>Next generation sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alic, Andy S.</creatorcontrib><creatorcontrib>Tomas, Andres</creatorcontrib><creatorcontrib>Medina, Ignacio</creatorcontrib><creatorcontrib>Blanquer, Ignacio</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alic, Andy S.</au><au>Tomas, Andres</au><au>Medina, Ignacio</au><au>Blanquer, Ignacio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment</atitle><jtitle>Information sciences</jtitle><date>2016-02-01</date><risdate>2016</risdate><volume>329</volume><spage>206</spage><epage>219</epage><pages>206-219</pages><issn>0020-0255</issn><eissn>1872-6291</eissn><abstract>•We present an error correction method based on grouping reads and Multiple Sequence Alignment (MSA).•This method supports any sequencing technology because it handles all types of errors, including indels.•PacBio datasets can be corrected without an existing genome or a helper dataset with less errors.•Our method is faster and uses less memory than the state of the art.•MuffinEc obtains better sensitivity, specificity and gain in most of our experiments. Error correction is typically the first step of de Novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at http://muffinec.sourceforge.net.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.ins.2015.09.012</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0020-0255
ispartof Information sciences, 2016-02, Vol.329, p.206-219
issn 0020-0255
1872-6291
language eng
recordid cdi_proquest_miscellaneous_1786155112
source ScienceDirect Journals (5 years ago - present)
subjects Assembly
Computation
Consumption
De novo
Deletion
Error correction
Genomes
Genomic error correction
Insertion
Mathematical models
Multiple sequence alignment
Next generation sequencing
title MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T19%3A02%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MuffinEc:%20Error%20correction%20for%20de%20Novo%20assembly%20via%20greedy%20partitioning%20and%20sequence%20alignment&rft.jtitle=Information%20sciences&rft.au=Alic,%20Andy%20S.&rft.date=2016-02-01&rft.volume=329&rft.spage=206&rft.epage=219&rft.pages=206-219&rft.issn=0020-0255&rft.eissn=1872-6291&rft_id=info:doi/10.1016/j.ins.2015.09.012&rft_dat=%3Cproquest_cross%3E1786155112%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1786155112&rft_id=info:pmid/&rft_els_id=S0020025515006660&rfr_iscdi=true