BigMHC Training and Evaluation Data
All training and evaluation data used in the BigMHC study (https://doi.org/10.1101/2022.08.29.505690). All code is freely available at https://github.com/KarchinLab/bigmhc --- CSV Columns mhc - MHC-I allele pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data tgt - t...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | All training and evaluation data used in the BigMHC study
(https://doi.org/10.1101/2022.08.29.505690).
All code is freely available at https://github.com/KarchinLab/bigmhc
---
CSV Columns
mhc - MHC-I allele
pep - peptide sequence if epitope data or mutated peptide sequence if neoepitope data
tgt - target value of 1 (presented/immunogenic) or 0 (non-presented/non-immunogenic)
manafest.csv columns also include wtp (wild-type peptide) and gene (the name of the mutated gene)
pseudoseqs.csv columns include the MHC-I allele along with the index and amino acid of the aligned positions.
All other columns are the outputs of MHC-I epitope presentation and immunogenicity predictors. ---
datasets.zip contains the curated training, validation, and testing datasets along with a summary of the number of negatives and positives for each allele in each dataset (summary.csv):
- el_test.csv - epitope presentation evaluation data and all model predictions
- el_train.csv - epitope presentation training data
- el_val.csv - epitope presentation validation data
- im_test.csv - immunogenicity transfer learning evaluation data and all model predictions
- im_train.csv - immunogenicity transfer learning training data
- im_val.csv - immunogenicity transfer learning validation data
- iedb.csv - infectious disease epitope evaluation data and all model predictions
- summary.csv - table of positives and negatives across each allele for each dataset
- manafest.csv - neoepitope immunogenicity data validated using MANAFEST assays
- pseudoseqs.csv - one-hot encoded MHC representations
eltrainval_models.zip contains the the models used to evaluate BigMHC on el_test.csv (the production models can be found in the GitHub repository)
---
sha256 sums are below:
datasets.zip - 0d152a452756cf2e0014ffced6afc25118e7c11cf1f626b26e49f50f79edffaa
eltrainval_models.zip - e8500173cb2afbe5f8e8c0ebc60785c0de6d91aab622f8b83fff3b4e65b43223
manafest.csv - accf19b8bb797ec842c3ee1b1ce1966feb35035df746f7a56b2994403ba1ad99
pseudoseqs.csv - cd1fa24fb4c9fc0ee592a3d753458c4e3abed0d5cc4ca76e325aa274df8e900a |
---|---|
DOI: | 10.17632/dvmz6pkzvb.3 |