Underlying data for "Mapping glycoprotein structure reveals Flaviviridae evolutionary history"

This repository houses the underlying data for "Mapping glycoprotein structure reveals Flaviviridae evolutionary history", authored by Jonathon C.O. Mifsud, Spyros Lytras, Michael R. Oliver, Kamilla Toon, Vincenzo A. Costa, Edward C. Holmes, and Joe Grove. The dataset is organised into sev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mifsud, Jonathon C.O., Lytras, Spyros, Oliver, Michael R., Toon, Kamilla, Costa, Vincenzo A., Holmes, Edward C., Grove, Joe
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This repository houses the underlying data for "Mapping glycoprotein structure reveals Flaviviridae evolutionary history", authored by Jonathon C.O. Mifsud, Spyros Lytras, Michael R. Oliver, Kamilla Toon, Vincenzo A. Costa, Edward C. Holmes, and Joe Grove. The dataset is organised into several directories: - flaviviridae_foldseek_output: Contains the Foldseek output and parsing scripts to extract the lowest e-value hit for each taxa and reference - flaviviridae_structure_blocks: Contains the Flaviviridae structures generated by ColabFold and ESMFold. Structures are organised by taxa and numbered based on their block number. Polyprotein sequences were broken into 300 residue blocks, each overlapping by 100 residues. Numbering starts at Block_0 (residues 1-300) and continue sequentially (e.g. Block_1 = residues 100-400, Block_2 = residues 200-500, ...). This dataset constitutes the Flaviviridae protein foldome referred to in the main text. - foldseek_reference_structures: Contains all structures used as references in FoldSeek analysis, including the Bole Tick Virus 4 proteins described in figure 3. - glycoprotein_structural_alignments_and_trees: Contains all files to replicate the trees for the E, E1 and E2 glycoproteins. The underlying code can be found in structural_alignments_code.ipynb This directory contains complete glycoprotein structure predictions (refolded_fullglyco). - ns5b_alignments_and_trees: Contains all alignment files, both trimmed and untrimmed, for NS5b RdRp. These include variations of alignments using different parameters, methods and those used in the stratified MUSCLE analysis. Also includes related scripts. - sequence_benchmarks: Contains the files and scripts underlying the sequence benchmark analysis - sequences: Holds sequence files including full genome sequences of Flaviviridae in .fasta formats, novel sequences identified in our study, and protein sequences extracted for alignment purposes. It also contains the script for creating the sequence blocks used in main analyses. - stratified_MUSCLE_analysis: Contains the files and scripts to replicate the stratified MUSCLE analysis. Underlying tree files are located in ns5b_alignments_and_trees  - t2rnase_alignments_and_trees: Contains all alignment and tree files, both trimmed and untrimmed, for t2rnase. - tables: Provides metadata tables, including interpro domain annotations, RNase T2 analyses summaries, phylogenetic model finder for the glycoprotein structural phylogenetics, and n
DOI:10.5281/zenodo.10616317