The Observed T Cell Receptor Space database enables paired-chain repertoire mining, coherence analysis, and language modeling
T cell activation is governed through T cell receptors (TCRs), heterodimers of two sequence-variable chains (often an α and β chain) that synergistically recognize antigen fragments presented on cell surfaces. Despite this, there only exist repositories dedicated to collecting single-chain, not pair...
Gespeichert in:
Veröffentlicht in: | Cell reports (Cambridge) 2024-09, Vol.43 (9), p.114704, Article 114704 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | T cell activation is governed through T cell receptors (TCRs), heterodimers of two sequence-variable chains (often an α and β chain) that synergistically recognize antigen fragments presented on cell surfaces. Despite this, there only exist repositories dedicated to collecting single-chain, not paired-chain, TCR sequence data. We addressed this gap by creating the Observed TCR Space (OTS) database, a source of consistently processed and annotated, full-length, paired-chain TCR sequences. Currently, OTS contains 5.35 million redundant (1.63 million non-redundant), predominantly human sequences from across 50 studies and at least 75 individuals. Using OTS, we identify pairing biases, public TCRs, and distinct chain coherence patterns relative to antibodies. We also release a paired-chain TCR language model, providing paired embedding representations and a method for residue in-filling conditional on the partner chain. OTS will be updated as a central community resource and is freely downloadable and available as a web application.
[Display omitted]
•Consistent processing of paired TCR sequences from 50+ studies (75+ individuals)•Data available online in a searchable format: https://opig.stats.ox.ac.uk/webapps/ots•Analysis of gene pairing biases, public clones, and coherence patterns•Open release of a full-length, paired TCR language model (TCRLang-paired)
Raybould and Greenshields-Watson et al. consistently process single-cell T cell receptor (TCR) sequencing datasets from 50 studies and release these 5.3 million full-length reads as the Observed TCR Space database. They explore V-gene pairing biases, public clones, and chain coherence patterns and train a paired-chain TCR language model. |
---|---|
ISSN: | 2211-1247 2211-1247 |
DOI: | 10.1016/j.celrep.2024.114704 |