BigMHC Training and Evaluation Data
All training and evaluation data used in the BigMHC study (https://doi.org/10.1101/2022.08.29.505690). All code is freely available at https://github.com/KarchinLab/bigmhc --- CSV Columns mhc - MHC-I allele pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data tgt - t...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | All training and evaluation data used in the BigMHC study
(https://doi.org/10.1101/2022.08.29.505690).
All code is freely available at https://github.com/KarchinLab/bigmhc
---
CSV Columns
mhc - MHC-I allele
pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data
tgt - target value of 1 (presented/immunogenic) or 0 (non-presented/non-immunogenic)
manafest.csv columns also include wtp (wild-type peptide) and gene (the name of the mutated gene)
pseudoseqs.csv columns include the MHC-I allele along with the index and amino acid of the aligned positions.
All other columns are the outputs of MHC-I epitope presentation and immunogenicity predictors. ---
datasets.zip contains the curated training, validation, and testing datasets along with a summary of the number of negatives and positives for each allele in each dataset (summary.csv):
- el_test.csv - epitope presentation evaluation data and all evaluated model predictions
- el_train.csv - epitope presentation training data
- el_val.csv - epitope presentation validation data
- im_test.csv - immunogenicity transfer learning evaluation data and all model predictions
- im_train.csv - immunogenicity transfer learning training data
- im_val.csv - immunogenicity transfer learning validation data
- iedb.csv - infectious disease epitope evaluation data and all model predictions
- summary.csv - table of positives and negatives across each allele for each dataset
- manafest.csv - neoepitope immunogenicity data validated using MANAFEST assays
- pseudoseqs.csv - one-hot encoded MHC representations
el.csv.zip contains the predictions of the BigMHC production models and all other methods on all EL data (train, val, test). The pMHCs were filtered so that all other methods can score them (e.g. peptide lengths 8-11).
eltrainval_models.zip contains the the models used to evaluate BigMHC on el_test.csv (the production models can be found in the GitHub repository)
---
sha256 sums are below:
datasets.zip - 0d152a452756cf2e0014ffced6afc25118e7c11cf1f626b26e49f50f79edffaa
el.csv.zip - cb94b42406b96a3b13d941cf87dd43f4f53e9ebfbc3d1619f0e43327f1fb6395
eltrainval_models.zip - e8500173cb2afbe5f8e8c0ebc60785c0de6d91aab622f8b83fff3b4e65b43223
manafest.csv - accf19b8bb797ec842c3ee1b1ce1966feb35035df746f7a56b2994403ba1ad99
pseudoseqs.csv - cd1fa24fb4c9fc0ee592a3d753458c4e3abed0d5cc4ca76e325aa274df8e900a |
---|---|
DOI: | 10.17632/dvmz6pkzvb.4 |