Data repository associated with 'A Functional Map of the Human Intrinsically Disordered Proteome'

ES_MAP.zip a hierarchically clustered map of the human IDR-ome .cdt and .gtr files - outputs of Cluster3.0 software can be visualized using JavaTreeView (see Tutorial_ES.pdf) TUTORIAL.zip, information on: visualization and analysis of the human IDR-ome map search for proteins of interest and explora...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Pritisanac, Iva
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:ES_MAP.zip a hierarchically clustered map of the human IDR-ome .cdt and .gtr files - outputs of Cluster3.0 software can be visualized using JavaTreeView (see Tutorial_ES.pdf) TUTORIAL.zip, information on: visualization and analysis of the human IDR-ome map search for proteins of interest and exploratory analyses of clusters automatic export and analysis of exported clusters (code available at https://github.com/IPritisanac/ES_PW) IDROME_SEQUENCES.zip human proteome fasta file IDRome fasta file SPOT-Disorder v1.0 disorder boundaries 13 044 unique protein sequences with at least one IDR (>=30 amino acids) 21 252 total unique human IDRs IDR_ALN.zip alignments of IDR sequences across ENSEMBL orthologs 19 459 IDR alignments UniProt ID and IDR boundaries for the human sequence are indicated in the name of the file FAIDR_TSTATS.zip hierarchical clustering of FAIDR t-statistics for 148 GO terms .cdt, .gtr files from Cluster3.0 can be visualized using JavaTreeView reveals the most predictive molecular features for the top performing 148 models CLUSTERS_EXPLORE.zip clusters obtained through exploratory analysis of the map provided in ES_MAP.zip 93 exported clusters in .cdt file format CLUSTERS_AUTO.zip clusters extracted from the hierarchically clustered IDR-ome map at a range of distance thresholds (0.4 - 0.8) in .cdt file format distance refers to the uncentered correlation distance between vectors of Z-scores representing human IDRs clusters extracted at different distance thresholds are split into separate archives AUTO_GO_FEATS.xlsx - summary of GO-term overrepresentation and feature enrichment analyses; each distance threshold is in a separate sheet FAIDR_HIGH_AUC_PPV_GO.zip target files with annotations of 148 GO terms for which good quality FAIDR models could be obtained (AUC >= 0.7, PPV >= 0.4) file format: three columns; 1st: IDR ID (includes IDR boundaries); 2nd: protein UniProt ID; 3rd: annotation of the protein to a GO term (1 if known to be associated with the GO term, 0 if not)    
DOI:10.5281/zenodo.10812874