Machine Learning based identification of putative coral pathogens in endangered Caribbean staghorn coral

Supplementary Files SupplementaryFile1.csv.gz – Metadata for field collected samples with columns: “sample_id” – individual sample names. “health” – “H” healthy and “D” diseased fragments. “year” – year fragment collected. “season” – season fragment collected (“S” July and “W” January) “site” – loc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Selwyn, Jason D., Despard, Brecia A., Vollmer, Miles V., Trytten, Emily C., Vollmer, Steven V.
Format:	Dataset
Sprache:	eng
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Supplementary Files SupplementaryFile1.csv.gz – Metadata for field collected samples with columns: “sample_id” – individual sample names. “health” – “H” healthy and “D” diseased fragments. “year” – year fragment collected. “season” – season fragment collected (“S” July and “W” January) “site” – location fragment collected from “lib.size” – total number of sequenced reads “norm.factors” – factor used to normalize read counts of ASVs SupplementaryFile2.csv.gz – Metadata for tank collected samples with columns: “sample_id” – individual sample names. “geno” – fragment genotype “fragment_id” – fragment identification tracked through repeated sampling “tank_id” – tank identification “time_treat” – concatenated metric for sampling time, exposure, and disease outcome separated by “_” Time – 0, 2, 8 Exposure – “D” Diseased, “N” Healthy Disease Outcome - “D” Diseased, “H” Healthy “lib.size” – total number of sequenced reads “norm.factors” – factor used to normalize read counts of ASVs SupplementaryFile3.fasta – FASTA file including complete 16s sequences named with ASV identifier and taxonomy. SupplementaryFile4.csv.gz – Matrix of the number of reads of each ASV sequenced in each sample. Combined both field and tank samples. SupplementaryFile5.csv.gz – Matrix of the log2 CPM of each ASV sequenced in each sample. Combined both field and tank samples. SupplementaryFile6.csv.gz – Complete results for each ASV association. “top_classification” – lowest taxonomic classification with more than 80% confidence. “taxonomy” – Full taxonomy including confidence in each taxonomic level. “passedFilter” – indicates taxa filtered from analysis due to rarity and/or lack of observations across sample times. “rank_” – machine learning model rankings, median ranking, and model estimated ranking along with standard error, confidence interval, and FDR adjusted p-value used to identify important ASVs. “ml_retained” – Indicates if the ASV was of above average importance to ML models. NA values indicate ASVs which were filtered prior to ML modelling. “fieldModel_” – ANOVA table results for each ASV testing the effects of health, year, season and all possible interactions indicating: Sums of squares, mean squares, numerator and denominator degrees of freedom, F statistic, p-value, and FDR corrected p-value. NA values are filled for ASVs filtered prior to differential abundance analysis. “diffAbundance_healthAssociation” – Marks the health association of ASVs from differential abun
DOI:	10.5281/zenodo.13485942