Machine Learning based identification of putative coral pathogens in endangered Caribbean staghorn coral
Supplementary Files SupplementaryFile1.csv.gz – Metadata for field collected samples with columns: “sample_id” – individual sample names. “health” – “H” healthy and “D” diseased fragments. “year” – year fragment collected. “season” – season fragment collected (“S” July and “W” January) “site” – loc...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Supplementary Files
SupplementaryFile1.csv.gz – Metadata for field collected samples with columns:
“sample_id” – individual sample names.
“health” – “H” healthy and “D” diseased fragments.
“year” – year fragment collected.
“season” – season fragment collected (“S” July and “W” January)
“site” – location fragment collected from
“lib.size” – total number of sequenced reads
“norm.factors” – factor used to normalize read counts of ASVs
SupplementaryFile2.csv.gz – Metadata for tank collected samples with columns:
“sample_id” – individual sample names.
“geno” – fragment genotype
“fragment_id” – fragment identification tracked through repeated sampling
“tank_id” – tank identification
“time_treat” – concatenated metric for sampling time, exposure, and disease outcome separated by “_”
Time – 0, 2, 8
Exposure – “D” Diseased, “N” Healthy
Disease Outcome - “D” Diseased, “H” Healthy
“lib.size” – total number of sequenced reads
“norm.factors” – factor used to normalize read counts of ASVs
SupplementaryFile3.fasta – FASTA file including complete 16s sequences named with ASV identifier and taxonomy.
SupplementaryFile4.csv.gz – Matrix of the number of reads of each ASV sequenced in each sample. Combined both field and tank samples.
SupplementaryFile5.csv.gz – Matrix of the log2 CPM of each ASV sequenced in each sample. Combined both field and tank samples.
SupplementaryFile6.csv.gz – Complete results for each ASV association.
“top_classification” – lowest taxonomic classification with more than 80% confidence.
“taxonomy” – Full taxonomy including confidence in each taxonomic level.
“passedFilter” – indicates taxa filtered from analysis due to rarity and/or lack of observations across sample times.
“rank_*” – machine learning model rankings, median ranking, and model estimated ranking along with standard error, confidence interval, and FDR adjusted p-value used to identify important ASVs.
“ml_retained” – Indicates if the ASV was of above average importance to ML models. NA values indicate ASVs which were filtered prior to ML modelling.
“fieldModel_*” – ANOVA table results for each ASV testing the effects of health, year, season and all possible interactions indicating:
Sums of squares, mean squares, numerator and denominator degrees of freedom, F statistic, p-value, and FDR corrected p-value.
NA values are filled for ASVs filtered prior to differential abundance analysis.
“diffAbundance_healthAssociation” – Marks the health association of ASVs from differential abun |
---|---|
DOI: | 10.5281/zenodo.13485942 |