Evaluation of software to map NGS reads from heterogeneous HCV1b populations

We studied the impact of analysis software and of reference sequence divergence on the recovery of minority variants with Illumina deep sequencing applied to highly variable HCV populations. Four HCV1b full genomes were sequenced by Sanger population and Illumina deep sequencing, starting from the s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Cuypers, Lize, Snoeck, Joke, Vrancken, Bram, Kerremans, Lien, Vuagniaux, Grégoire, Nevens, Frederik, Vandamme, Anne-Mieke
Format: Other
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Cuypers, Lize
Snoeck, Joke
Vrancken, Bram
Kerremans, Lien
Vuagniaux, Grégoire
Nevens, Frederik
Vandamme, Anne-Mieke
description We studied the impact of analysis software and of reference sequence divergence on the recovery of minority variants with Illumina deep sequencing applied to highly variable HCV populations. Four HCV1b full genomes were sequenced by Sanger population and Illumina deep sequencing, starting from the same PCR fragments. Reads were mapped to a published HCV1b reference sequence, to the sample-specific Sanger sequence and to an in silico reconstructed data-specific reference sequence (VICUNA). Of the four tested software packages (MAQ, Bowtie, BWA and Segminator II), Segminator II consistently mapped the largest number of reads, recovering respectively 77.3% (± 2.4), 80.6% (± 3.0) and 82.1% (± 4.0) of reads whereas this was between 29% and 77% for the other methods. For all packages, the number of mapped reads increased when a sample-specific sequence was used instead of a more distantly related HCV1b reference sequence. For Segminator II, the concordance between the three NGS consensus sequences (representing the three mapping strategies) and the obtained Sanger sequence, was >99% when neglecting ambiguities. When using a 5% threshold, for one sample 184 differences in ambiguities between the NGS analysis with a data-specific and HCV1b published reference sequence were all located in or near a divergent area, and therefore most probably represented true minority variants. To conclude, for deep sequencing of highly variable viral sequences, we recommend the use of a sample-/data-specific reference sequence and of tailored mapping software in order to map the maximum number of reads and recover as much as possible minority variants.
format Other
fullrecord <record><control><sourceid>kuleuven_FZOIL</sourceid><recordid>TN_cdi_kuleuven_dspace_123456789_479008</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>123456789_479008</sourcerecordid><originalsourceid>FETCH-kuleuven_dspace_123456789_4790083</originalsourceid><addsrcrecordid>eNqNzD0OgkAQQOFtLAx6h-ksjAkICtQEpTA2GtvNKLNqXJjN_qDHNzEegOo1L99UHOoBdUD_5B5YgWPl32gJPEOHBo77E1jC1oGy3MGDPFm-U08cHDTVJbmCYRP0D3AzMVGoHc3_jcRiV5-rZvUKmsJAvWydwRvJZJ1mm21elDLLyzgu0kgsx53Sf3w63v0CjG5GmA</addsrcrecordid><sourcetype>Institutional Repository</sourcetype><iscdi>true</iscdi><recordtype>other</recordtype></control><display><type>other</type><title>Evaluation of software to map NGS reads from heterogeneous HCV1b populations</title><source>Lirias (KU Leuven Association)</source><creator>Cuypers, Lize ; Snoeck, Joke ; Vrancken, Bram ; Kerremans, Lien ; Vuagniaux, Grégoire ; Nevens, Frederik ; Vandamme, Anne-Mieke</creator><creatorcontrib>Cuypers, Lize ; Snoeck, Joke ; Vrancken, Bram ; Kerremans, Lien ; Vuagniaux, Grégoire ; Nevens, Frederik ; Vandamme, Anne-Mieke</creatorcontrib><description>We studied the impact of analysis software and of reference sequence divergence on the recovery of minority variants with Illumina deep sequencing applied to highly variable HCV populations. Four HCV1b full genomes were sequenced by Sanger population and Illumina deep sequencing, starting from the same PCR fragments. Reads were mapped to a published HCV1b reference sequence, to the sample-specific Sanger sequence and to an in silico reconstructed data-specific reference sequence (VICUNA). Of the four tested software packages (MAQ, Bowtie, BWA and Segminator II), Segminator II consistently mapped the largest number of reads, recovering respectively 77.3% (± 2.4), 80.6% (± 3.0) and 82.1% (± 4.0) of reads whereas this was between 29% and 77% for the other methods. For all packages, the number of mapped reads increased when a sample-specific sequence was used instead of a more distantly related HCV1b reference sequence. For Segminator II, the concordance between the three NGS consensus sequences (representing the three mapping strategies) and the obtained Sanger sequence, was &gt;99% when neglecting ambiguities. When using a 5% threshold, for one sample 184 differences in ambiguities between the NGS analysis with a data-specific and HCV1b published reference sequence were all located in or near a divergent area, and therefore most probably represented true minority variants. To conclude, for deep sequencing of highly variable viral sequences, we recommend the use of a sample-/data-specific reference sequence and of tailored mapping software in order to map the maximum number of reads and recover as much as possible minority variants.</description><language>eng</language><creationdate>2014</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,780,27859</link.rule.ids><linktorsrc>$$Uhttps://lirias.kuleuven.be/handle/123456789/479008$$EView_record_in_KU_Leuven_Association$$FView_record_in_$$GKU_Leuven_Association$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Cuypers, Lize</creatorcontrib><creatorcontrib>Snoeck, Joke</creatorcontrib><creatorcontrib>Vrancken, Bram</creatorcontrib><creatorcontrib>Kerremans, Lien</creatorcontrib><creatorcontrib>Vuagniaux, Grégoire</creatorcontrib><creatorcontrib>Nevens, Frederik</creatorcontrib><creatorcontrib>Vandamme, Anne-Mieke</creatorcontrib><title>Evaluation of software to map NGS reads from heterogeneous HCV1b populations</title><description>We studied the impact of analysis software and of reference sequence divergence on the recovery of minority variants with Illumina deep sequencing applied to highly variable HCV populations. Four HCV1b full genomes were sequenced by Sanger population and Illumina deep sequencing, starting from the same PCR fragments. Reads were mapped to a published HCV1b reference sequence, to the sample-specific Sanger sequence and to an in silico reconstructed data-specific reference sequence (VICUNA). Of the four tested software packages (MAQ, Bowtie, BWA and Segminator II), Segminator II consistently mapped the largest number of reads, recovering respectively 77.3% (± 2.4), 80.6% (± 3.0) and 82.1% (± 4.0) of reads whereas this was between 29% and 77% for the other methods. For all packages, the number of mapped reads increased when a sample-specific sequence was used instead of a more distantly related HCV1b reference sequence. For Segminator II, the concordance between the three NGS consensus sequences (representing the three mapping strategies) and the obtained Sanger sequence, was &gt;99% when neglecting ambiguities. When using a 5% threshold, for one sample 184 differences in ambiguities between the NGS analysis with a data-specific and HCV1b published reference sequence were all located in or near a divergent area, and therefore most probably represented true minority variants. To conclude, for deep sequencing of highly variable viral sequences, we recommend the use of a sample-/data-specific reference sequence and of tailored mapping software in order to map the maximum number of reads and recover as much as possible minority variants.</description><fulltext>true</fulltext><rsrctype>other</rsrctype><creationdate>2014</creationdate><recordtype>other</recordtype><sourceid>FZOIL</sourceid><recordid>eNqNzD0OgkAQQOFtLAx6h-ksjAkICtQEpTA2GtvNKLNqXJjN_qDHNzEegOo1L99UHOoBdUD_5B5YgWPl32gJPEOHBo77E1jC1oGy3MGDPFm-U08cHDTVJbmCYRP0D3AzMVGoHc3_jcRiV5-rZvUKmsJAvWydwRvJZJ1mm21elDLLyzgu0kgsx53Sf3w63v0CjG5GmA</recordid><startdate>201409</startdate><enddate>201409</enddate><creator>Cuypers, Lize</creator><creator>Snoeck, Joke</creator><creator>Vrancken, Bram</creator><creator>Kerremans, Lien</creator><creator>Vuagniaux, Grégoire</creator><creator>Nevens, Frederik</creator><creator>Vandamme, Anne-Mieke</creator><scope>FZOIL</scope></search><sort><creationdate>201409</creationdate><title>Evaluation of software to map NGS reads from heterogeneous HCV1b populations</title><author>Cuypers, Lize ; Snoeck, Joke ; Vrancken, Bram ; Kerremans, Lien ; Vuagniaux, Grégoire ; Nevens, Frederik ; Vandamme, Anne-Mieke</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-kuleuven_dspace_123456789_4790083</frbrgroupid><rsrctype>other</rsrctype><prefilter>other</prefilter><language>eng</language><creationdate>2014</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Cuypers, Lize</creatorcontrib><creatorcontrib>Snoeck, Joke</creatorcontrib><creatorcontrib>Vrancken, Bram</creatorcontrib><creatorcontrib>Kerremans, Lien</creatorcontrib><creatorcontrib>Vuagniaux, Grégoire</creatorcontrib><creatorcontrib>Nevens, Frederik</creatorcontrib><creatorcontrib>Vandamme, Anne-Mieke</creatorcontrib><collection>Lirias (KU Leuven Association)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cuypers, Lize</au><au>Snoeck, Joke</au><au>Vrancken, Bram</au><au>Kerremans, Lien</au><au>Vuagniaux, Grégoire</au><au>Nevens, Frederik</au><au>Vandamme, Anne-Mieke</au><format>book</format><genre>document</genre><ristype>GEN</ristype><title>Evaluation of software to map NGS reads from heterogeneous HCV1b populations</title><date>2014-09</date><risdate>2014</risdate><abstract>We studied the impact of analysis software and of reference sequence divergence on the recovery of minority variants with Illumina deep sequencing applied to highly variable HCV populations. Four HCV1b full genomes were sequenced by Sanger population and Illumina deep sequencing, starting from the same PCR fragments. Reads were mapped to a published HCV1b reference sequence, to the sample-specific Sanger sequence and to an in silico reconstructed data-specific reference sequence (VICUNA). Of the four tested software packages (MAQ, Bowtie, BWA and Segminator II), Segminator II consistently mapped the largest number of reads, recovering respectively 77.3% (± 2.4), 80.6% (± 3.0) and 82.1% (± 4.0) of reads whereas this was between 29% and 77% for the other methods. For all packages, the number of mapped reads increased when a sample-specific sequence was used instead of a more distantly related HCV1b reference sequence. For Segminator II, the concordance between the three NGS consensus sequences (representing the three mapping strategies) and the obtained Sanger sequence, was &gt;99% when neglecting ambiguities. When using a 5% threshold, for one sample 184 differences in ambiguities between the NGS analysis with a data-specific and HCV1b published reference sequence were all located in or near a divergent area, and therefore most probably represented true minority variants. To conclude, for deep sequencing of highly variable viral sequences, we recommend the use of a sample-/data-specific reference sequence and of tailored mapping software in order to map the maximum number of reads and recover as much as possible minority variants.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_kuleuven_dspace_123456789_479008
source Lirias (KU Leuven Association)
title Evaluation of software to map NGS reads from heterogeneous HCV1b populations
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T21%3A23%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-kuleuven_FZOIL&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.au=Cuypers,%20Lize&rft.date=2014-09&rft_id=info:doi/&rft_dat=%3Ckuleuven_FZOIL%3E123456789_479008%3C/kuleuven_FZOIL%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true