Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences

Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics (Oxford, England) England), 2016-07, Vol.32 (14), p.2103-2110
1. Verfasser:	Li, Heng
Format:	Artikel
Sprache:	eng
Schlagworte:	Animals Bacteria - genetics bioinformatics Caenorhabditis elegans Caenorhabditis elegans - genetics Chromosome Mapping - methods Computational Biology consensus sequence genome assembly High-Throughput Nucleotide Sequencing - methods Humans nanopores Original Papers Sequence Analysis, DNA Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2110
container_issue	14
container_start_page	2103
container_title	Bioinformatics (Oxford, England)
container_volume	32
creator	Li, Heng
description	Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads. We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. https://github.com/lh3/minimap and https://github.com/lh3/miniasm hengli@broadinstitute.org Supplementary data are available at Bioinformatics online.
doi_str_mv	10.1093/bioinformatics/btw152
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4937194</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1803798122</sourcerecordid><originalsourceid>FETCH-LOGICAL-c510t-fb0b345db03ea2dd504623516957ed90d641e22ea3999cd77ed19dee2c6c76843</originalsourceid><addsrcrecordid>eNqFUctOwzAQtBCIlsIngHLkEupHnMQckFDFSwL1AmfLsTfFKLFLnBb173FpqeiJ065mZ2cfg9A5wVcECzaurLeu9l2reqvDuOq_CKcHaEhYXqRZScjhLsdsgE5C-MAYc8zzYzSgBeGMCzZE0xfrbKvmiXImaWOuQnud1Cr0SUTn1s1-KgYS55c-USFAWzWrJE6OiA2rpPGRE-BzAU5DOEVHtWoCnG3jCL3d371OHtPn6cPT5PY51ZzgPq0rXLGMmwozUNQYjrOcMk5ywQswAps8I0ApKCaE0KaIIBEGgOpcF3mZsRG62ejOF1ULRoPrO9XIeReP6VbSKyv3K86-y5lfykywgoi1wOVWoPNx99DL1gYNTaMc-EWQlHJGeVlk5F8qiR8uREkojVS-oerOh9BBvduIYLn2Te77Jje-xb6Lv-fsun6NYt_8oZr1</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1803798122</pqid></control><display><type>article</type><title>Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Li, Heng</creator><creatorcontrib>Li, Heng</creatorcontrib><description>Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads. We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. https://github.com/lh3/minimap and https://github.com/lh3/miniasm hengli@broadinstitute.org Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>ISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btw152</identifier><identifier>PMID: 27153593</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Animals ; Bacteria - genetics ; bioinformatics ; Caenorhabditis elegans ; Caenorhabditis elegans - genetics ; Chromosome Mapping - methods ; Computational Biology ; consensus sequence ; genome assembly ; High-Throughput Nucleotide Sequencing - methods ; Humans ; nanopores ; Original Papers ; Sequence Analysis, DNA ; Software</subject><ispartof>Bioinformatics (Oxford, England), 2016-07, Vol.32 (14), p.2103-2110</ispartof><rights>The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</rights><rights>The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c510t-fb0b345db03ea2dd504623516957ed90d641e22ea3999cd77ed19dee2c6c76843</citedby><cites>FETCH-LOGICAL-c510t-fb0b345db03ea2dd504623516957ed90d641e22ea3999cd77ed19dee2c6c76843</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937194/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4937194/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27153593$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Heng</creatorcontrib><title>Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads. We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. https://github.com/lh3/minimap and https://github.com/lh3/miniasm hengli@broadinstitute.org Supplementary data are available at Bioinformatics online.</description><subject>Animals</subject><subject>Bacteria - genetics</subject><subject>bioinformatics</subject><subject>Caenorhabditis elegans</subject><subject>Caenorhabditis elegans - genetics</subject><subject>Chromosome Mapping - methods</subject><subject>Computational Biology</subject><subject>consensus sequence</subject><subject>genome assembly</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>Humans</subject><subject>nanopores</subject><subject>Original Papers</subject><subject>Sequence Analysis, DNA</subject><subject>Software</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFUctOwzAQtBCIlsIngHLkEupHnMQckFDFSwL1AmfLsTfFKLFLnBb173FpqeiJ065mZ2cfg9A5wVcECzaurLeu9l2reqvDuOq_CKcHaEhYXqRZScjhLsdsgE5C-MAYc8zzYzSgBeGMCzZE0xfrbKvmiXImaWOuQnud1Cr0SUTn1s1-KgYS55c-USFAWzWrJE6OiA2rpPGRE-BzAU5DOEVHtWoCnG3jCL3d371OHtPn6cPT5PY51ZzgPq0rXLGMmwozUNQYjrOcMk5ywQswAps8I0ApKCaE0KaIIBEGgOpcF3mZsRG62ejOF1ULRoPrO9XIeReP6VbSKyv3K86-y5lfykywgoi1wOVWoPNx99DL1gYNTaMc-EWQlHJGeVlk5F8qiR8uREkojVS-oerOh9BBvduIYLn2Te77Jje-xb6Lv-fsun6NYt_8oZr1</recordid><startdate>20160715</startdate><enddate>20160715</enddate><creator>Li, Heng</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7S9</scope><scope>L.6</scope><scope>5PM</scope></search><sort><creationdate>20160715</creationdate><title>Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences</title><author>Li, Heng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c510t-fb0b345db03ea2dd504623516957ed90d641e22ea3999cd77ed19dee2c6c76843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Animals</topic><topic>Bacteria - genetics</topic><topic>bioinformatics</topic><topic>Caenorhabditis elegans</topic><topic>Caenorhabditis elegans - genetics</topic><topic>Chromosome Mapping - methods</topic><topic>Computational Biology</topic><topic>consensus sequence</topic><topic>genome assembly</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>Humans</topic><topic>nanopores</topic><topic>Original Papers</topic><topic>Sequence Analysis, DNA</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Heng</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Heng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2016-07-15</date><risdate>2016</risdate><volume>32</volume><issue>14</issue><spage>2103</spage><epage>2110</epage><pages>2103-2110</pages><issn>1367-4803</issn><issn>1460-2059</issn><eissn>1367-4811</eissn><abstract>Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads. We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. https://github.com/lh3/minimap and https://github.com/lh3/miniasm hengli@broadinstitute.org Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>27153593</pmid><doi>10.1093/bioinformatics/btw152</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1367-4803
ispartof	Bioinformatics (Oxford, England), 2016-07, Vol.32 (14), p.2103-2110
issn	1367-4803 1460-2059 1367-4811
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4937194
source	Oxford Journals Open Access Collection; MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects	Animals Bacteria - genetics bioinformatics Caenorhabditis elegans Caenorhabditis elegans - genetics Chromosome Mapping - methods Computational Biology consensus sequence genome assembly High-Throughput Nucleotide Sequencing - methods Humans nanopores Original Papers Sequence Analysis, DNA Software
title	Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T20%3A38%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Minimap%20and%20miniasm:%20fast%20mapping%20and%20de%20novo%20assembly%20for%20noisy%20long%20sequences&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Li,%20Heng&rft.date=2016-07-15&rft.volume=32&rft.issue=14&rft.spage=2103&rft.epage=2110&rft.pages=2103-2110&rft.issn=1367-4803&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btw152&rft_dat=%3Cproquest_pubme%3E1803798122%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1803798122&rft_id=info:pmid/27153593&rfr_iscdi=true