Evaluation of software to map NGS reads from heterogeneous HCV1b populations

We studied the impact of analysis software and of reference sequence divergence on the recovery of minority variants with Illumina deep sequencing applied to highly variable HCV populations. Four HCV1b full genomes were sequenced by Sanger population and Illumina deep sequencing, starting from the s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cuypers, Lize, Snoeck, Joke, Vrancken, Bram, Kerremans, Lien, Vuagniaux, Grégoire, Nevens, Frederik, Vandamme, Anne-Mieke
Format: Other
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We studied the impact of analysis software and of reference sequence divergence on the recovery of minority variants with Illumina deep sequencing applied to highly variable HCV populations. Four HCV1b full genomes were sequenced by Sanger population and Illumina deep sequencing, starting from the same PCR fragments. Reads were mapped to a published HCV1b reference sequence, to the sample-specific Sanger sequence and to an in silico reconstructed data-specific reference sequence (VICUNA). Of the four tested software packages (MAQ, Bowtie, BWA and Segminator II), Segminator II consistently mapped the largest number of reads, recovering respectively 77.3% (± 2.4), 80.6% (± 3.0) and 82.1% (± 4.0) of reads whereas this was between 29% and 77% for the other methods. For all packages, the number of mapped reads increased when a sample-specific sequence was used instead of a more distantly related HCV1b reference sequence. For Segminator II, the concordance between the three NGS consensus sequences (representing the three mapping strategies) and the obtained Sanger sequence, was >99% when neglecting ambiguities. When using a 5% threshold, for one sample 184 differences in ambiguities between the NGS analysis with a data-specific and HCV1b published reference sequence were all located in or near a divergent area, and therefore most probably represented true minority variants. To conclude, for deep sequencing of highly variable viral sequences, we recommend the use of a sample-/data-specific reference sequence and of tailored mapping software in order to map the maximum number of reads and recover as much as possible minority variants.