SeqEntropy: Genome-Wide Assessment of Repeats for Short Read Sequencing. e59484
Background Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on sh...
Gespeichert in:
Veröffentlicht in: | PloS one 2013-03, Vol.8 (3) |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free) reads at different lengths. Methodology/Principal Findings We define a metric H(k) to be the entropy of sequencing reads at a read length k and use the relative loss of entropy Delta H(k) to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of Delta H(k)>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve Delta H(k) |
---|---|
ISSN: | 1932-6203 |
DOI: | 10.1371/journal.pone.0059484 |