Data repository associated with 'A Functional Map of the Human Intrinsically Disordered Proteome'
ES_MAP.zip a hierarchically clustered map of the human IDR-ome .cdt and .gtr files - outputs of Cluster3.0 software can be visualized using JavaTreeView (see Tutorial_ES.pdf) TUTORIAL.zip, information on: visualization and analysis of the human IDR-ome map search for proteins of interest and explora...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | ES_MAP.zip
a hierarchically clustered map of the human IDR-ome
.cdt and .gtr files - outputs of Cluster3.0 software
can be visualized using JavaTreeView (see Tutorial_ES.pdf)
TUTORIAL.zip, information on:
visualization and analysis of the human IDR-ome map
search for proteins of interest and exploratory analyses of clusters
automatic export and analysis of exported clusters (code available at https://github.com/IPritisanac/ES_PW)
IDROME_SEQUENCES.zip
human proteome fasta file
IDRome fasta file
SPOT-Disorder v1.0 disorder boundaries
13 044 unique protein sequences with at least one IDR (>=30 amino acids)
21 252 total unique human IDRs
IDR_ALN.zip
alignments of IDR sequences across ENSEMBL orthologs
19 459 IDR alignments
UniProt ID and IDR boundaries for the human sequence are indicated in the name of the file
FAIDR_TSTATS.zip
hierarchical clustering of FAIDR t-statistics for 148 GO terms
.cdt, .gtr files from Cluster3.0
can be visualized using JavaTreeView
reveals the most predictive molecular features for the top performing 148 models
CLUSTERS_EXPLORE.zip
clusters obtained through exploratory analysis of the map provided in ES_MAP.zip
93 exported clusters in .cdt file format
CLUSTERS_AUTO.zip
clusters extracted from the hierarchically clustered IDR-ome map at a range of distance thresholds (0.4 - 0.8) in .cdt file format
distance refers to the uncentered correlation distance between vectors of Z-scores representing human IDRs
clusters extracted at different distance thresholds are split into separate archives
AUTO_GO_FEATS.xlsx - summary of GO-term overrepresentation and feature enrichment analyses; each distance threshold is in a separate sheet
FAIDR_HIGH_AUC_PPV_GO.zip
target files with annotations of 148 GO terms for which good quality FAIDR models could be obtained (AUC >= 0.7, PPV >= 0.4)
file format: three columns; 1st: IDR ID (includes IDR boundaries); 2nd: protein UniProt ID; 3rd: annotation of the protein to a GO term (1 if known to be associated with the GO term, 0 if not)
|
---|---|
DOI: | 10.5281/zenodo.10812874 |