The results of the comparative taxonomic and genomic analysis of the viromes from Lake Baikal and other freshwater bodies
The dataset contains the tables and files that demonstrate the results of a comparative genomic, taxonomic and functional analysis of viral communities from Lake Baikal and other freshwater lakes [1-8]. Taxonomic identification of the sequences (metagenomic reads) was carried out using the BLASTn [9...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The dataset contains the tables and files that demonstrate the results of a comparative genomic, taxonomic and functional analysis of viral communities from Lake Baikal and other freshwater lakes [1-8]. Taxonomic identification of the sequences (metagenomic reads) was carried out using the BLASTn [9] and DIAMOND [10] programs against the NCBI RefSeq complete viral genome and proteome database [11]. De novo assembly was carried out using SPAdes 3.13.0 metagenomics assembler, metaSPAdes [12]. The «VirSorter» tool [13] was used for identifying the viral scaffolds and viral proteins. Functional annotation of viral proteins in the viromes of Lake Baikal was carried out using COG (Clusters of Orthologous Groups) [14] and KEGG pathway [15] classification groups. For taxonomic analysis, comparisons of DNA reads with complete viral genomes using the BLASTn program were carried out on five high performance nodes Intel Xeon E5-2695 v4 "Broadwell" CPU (2 CPUs, 36 cores total, with 128 Gb RAM per node), total calculation time ~ 24 hours. Comparisons of DNA reads with complete viral proteomes using the DIAMOND program were carried out on five high performance nodes Intel Xeon E5-2695 v4 "Broadwell" CPU (2 CPUs, 36 cores total, with 128 Gb RAM per node), total calculation time ~ 3 hours. The paired reads assembly was performed using AMD Opteron 6278 (8 CPU, 64 cores total), 945 Gb RAM, total assembling time ~ 399 hours.Full list of Supplementary MaterialsFiguresFigure S1: A complete scheme of bioinformatic analysis. The stages of the analysis are highlighted with red boxes, the resulting datasets are blue with a dashed stroke, and the databases (DB) used are purple.Figure S2: Dominated viral families in the investigated Baikal viromes. The percentages greater than one percent are shown in the diagrams.Figure S3: The general functional annotation of the Baikal virome datasets using the COG (A) and KEGG pathway (B) databases.Figure S4: The phyla of the hosts predicted for revealed Baikal viruses using the Virus-Host database (A) and the VirHostMatcher-Net software (B).TablesTable S1: The list and taxonomy of virotypes revealed in analyzed freshwater viromes (initial number of reads per virotype).Table S2: The list and taxonomy of virotypes revealed in analyzed freshwater viromes (number of reads per virotype normalized to genome length).Table S3. The list and taxonomy of virotypes revealed in analyzed freshwater viromes (number of reads per virotype normalized to genome le |
---|---|
DOI: | 10.6084/m9.figshare.12814637 |