PBSIM: PacBio reads simulator--toward accurate genome assembly

PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2013-01, Vol.29 (1), p.119-121
Hauptverfasser: Ono, Yukiteru, Asai, Kiyoshi, Hamada, Michiaki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries. Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results. PBSIM is freely available from the web under the GNU GPL v2 license (http://code.google.com/p/pbsim/).
ISSN:1367-4803
1367-4811
1460-2059
DOI:10.1093/bioinformatics/bts649