Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model

The emergence of the novel coronavirus, designated as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has posed a significant threat to public health worldwide. There has been progress in reducing hospitalizations and deaths due to SARS-CoV-2. However, challenges stem from the emergenc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2024-05, Vol.21 (3), p.428-438
Hauptverfasser: Yoo, Sunyong, Jeong, Myeonghyeon, Seomun, Subhin, Kim, Kiseong, Han, Youngmahn
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 438
container_issue 3
container_start_page 428
container_title IEEE/ACM transactions on computational biology and bioinformatics
container_volume 21
creator Yoo, Sunyong
Jeong, Myeonghyeon
Seomun, Subhin
Kim, Kiseong
Han, Youngmahn
description The emergence of the novel coronavirus, designated as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has posed a significant threat to public health worldwide. There has been progress in reducing hospitalizations and deaths due to SARS-CoV-2. However, challenges stem from the emergence of SARS-CoV-2 variants, which exhibit high transmission rates, increased disease severity, and the ability to evade humoral immunity. Epitope-specific T-cell receptor (TCR) recognition is key in determining the T-cell immunogenicity for SARS-CoV-2 epitopes. Although several data-driven methods for predicting epitope-specific TCR recognition have been proposed, they remain challenging due to the enormous diversity of TCRs and the lack of available training data. Self-supervised transfer learning has recently been proven useful for extracting information from unlabeled protein sequences, increasing the predictive performance of fine-tuned models, and using a relatively small amount of training data. This study presents a deep-learning model generated by fine-tuning pre-trained protein embeddings from a large corpus of protein sequences. The fine-tuned model showed markedly high predictive performance and outperformed the recent Gaussian process-based prediction model. The output attentions captured by the deep-learning model suggested critical amino acid positions in the SARS-CoV-2 epitope-specific TCRβ sequences that are highly associated with the viral escape of T-cell immune response.
doi_str_mv 10.1109/TCBB.2024.3368046
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3064715067</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10443062</ieee_id><sourcerecordid>2930474184</sourcerecordid><originalsourceid>FETCH-LOGICAL-c345t-86662a774da69cba04efcc431b8461534ae66916ff3cdd18eec965f49dcc3c843</originalsourceid><addsrcrecordid>eNpdkcFu1DAQhi0EoqXwAEgIWeqFixc7njjxsY0KVFoE2t1ytbzOZOUqawc7OfD2JN0FIU4zh-__Z6SPkLeCr4Tg-uOuub1dFbyAlZSq5qCekUtRlhXTWsHzZYeSlVrJC_Iq50c-k5rDS3Iha1kLJetLku7DiGlIONp9j_R7wta70cdAY0e3N5sta-IPVtC7wY9xQLYd0PnOO7prNnSDLh6Cf8Ifsg8HapcGtkvWB2znPY7oA13bcJjsAenX2GL_mrzobJ_xzXlekYdPd7vmC1t_-3zf3KyZk1COrFZKFbaqoLVKu73lgJ1zIMW-BiVKCRaV0kJ1nXRtK2pEp1XZgW6dk64GeUU-nHqHFH9OmEdz9Nlh39uAccqm0JJDBeIJvf4PfYxTCvN3RnIFlSi5qmZKnCiXYs4JOzMkf7TplxHcLELMIsQsQsxZyJx5f26e9kds_yb-GJiBdyfAI-I_hQDz5UL-BoHUjg4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064715067</pqid></control><display><type>article</type><title>Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model</title><source>Access via ACM Digital Library</source><source>IEEE Electronic Library (IEL)</source><creator>Yoo, Sunyong ; Jeong, Myeonghyeon ; Seomun, Subhin ; Kim, Kiseong ; Han, Youngmahn</creator><creatorcontrib>Yoo, Sunyong ; Jeong, Myeonghyeon ; Seomun, Subhin ; Kim, Kiseong ; Han, Youngmahn</creatorcontrib><description>The emergence of the novel coronavirus, designated as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has posed a significant threat to public health worldwide. There has been progress in reducing hospitalizations and deaths due to SARS-CoV-2. However, challenges stem from the emergence of SARS-CoV-2 variants, which exhibit high transmission rates, increased disease severity, and the ability to evade humoral immunity. Epitope-specific T-cell receptor (TCR) recognition is key in determining the T-cell immunogenicity for SARS-CoV-2 epitopes. Although several data-driven methods for predicting epitope-specific TCR recognition have been proposed, they remain challenging due to the enormous diversity of TCRs and the lack of available training data. Self-supervised transfer learning has recently been proven useful for extracting information from unlabeled protein sequences, increasing the predictive performance of fine-tuned models, and using a relatively small amount of training data. This study presents a deep-learning model generated by fine-tuning pre-trained protein embeddings from a large corpus of protein sequences. The fine-tuned model showed markedly high predictive performance and outperformed the recent Gaussian process-based prediction model. The output attentions captured by the deep-learning model suggested critical amino acid positions in the SARS-CoV-2 epitope-specific TCRβ sequences that are highly associated with the viral escape of T-cell immune response.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2024.3368046</identifier><identifier>PMID: 38381638</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Amino acids ; Attention mechanism ; Attention mechanisms ; Coronaviruses ; COVID-19 ; Deep learning ; Disease transmission ; epitope ; Epitopes ; Gaussian process ; Humoral immunity ; Immune response ; Immune response (cell-mediated) ; Immune system ; Immunogenicity ; Lymphocytes ; Lymphocytes T ; Machine learning ; Performance prediction ; Prediction models ; Predictive models ; Proteins ; Public health ; Recognition ; SARS-CoV-2 ; Severe acute respiratory syndrome coronavirus 2 ; T cell receptors ; T-cell receptor ; Transfer learning ; Viral diseases</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2024-05, Vol.21 (3), p.428-438</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c345t-86662a774da69cba04efcc431b8461534ae66916ff3cdd18eec965f49dcc3c843</cites><orcidid>0009-0005-4299-4765 ; 0000-0001-7394-3673 ; 0009-0006-5271-3549 ; 0000-0003-0925-1853 ; 0000-0001-9935-3516</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10443062$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38381638$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yoo, Sunyong</creatorcontrib><creatorcontrib>Jeong, Myeonghyeon</creatorcontrib><creatorcontrib>Seomun, Subhin</creatorcontrib><creatorcontrib>Kim, Kiseong</creatorcontrib><creatorcontrib>Han, Youngmahn</creatorcontrib><title>Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>The emergence of the novel coronavirus, designated as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has posed a significant threat to public health worldwide. There has been progress in reducing hospitalizations and deaths due to SARS-CoV-2. However, challenges stem from the emergence of SARS-CoV-2 variants, which exhibit high transmission rates, increased disease severity, and the ability to evade humoral immunity. Epitope-specific T-cell receptor (TCR) recognition is key in determining the T-cell immunogenicity for SARS-CoV-2 epitopes. Although several data-driven methods for predicting epitope-specific TCR recognition have been proposed, they remain challenging due to the enormous diversity of TCRs and the lack of available training data. Self-supervised transfer learning has recently been proven useful for extracting information from unlabeled protein sequences, increasing the predictive performance of fine-tuned models, and using a relatively small amount of training data. This study presents a deep-learning model generated by fine-tuning pre-trained protein embeddings from a large corpus of protein sequences. The fine-tuned model showed markedly high predictive performance and outperformed the recent Gaussian process-based prediction model. The output attentions captured by the deep-learning model suggested critical amino acid positions in the SARS-CoV-2 epitope-specific TCRβ sequences that are highly associated with the viral escape of T-cell immune response.</description><subject>Amino acids</subject><subject>Attention mechanism</subject><subject>Attention mechanisms</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>Deep learning</subject><subject>Disease transmission</subject><subject>epitope</subject><subject>Epitopes</subject><subject>Gaussian process</subject><subject>Humoral immunity</subject><subject>Immune response</subject><subject>Immune response (cell-mediated)</subject><subject>Immune system</subject><subject>Immunogenicity</subject><subject>Lymphocytes</subject><subject>Lymphocytes T</subject><subject>Machine learning</subject><subject>Performance prediction</subject><subject>Prediction models</subject><subject>Predictive models</subject><subject>Proteins</subject><subject>Public health</subject><subject>Recognition</subject><subject>SARS-CoV-2</subject><subject>Severe acute respiratory syndrome coronavirus 2</subject><subject>T cell receptors</subject><subject>T-cell receptor</subject><subject>Transfer learning</subject><subject>Viral diseases</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNpdkcFu1DAQhi0EoqXwAEgIWeqFixc7njjxsY0KVFoE2t1ytbzOZOUqawc7OfD2JN0FIU4zh-__Z6SPkLeCr4Tg-uOuub1dFbyAlZSq5qCekUtRlhXTWsHzZYeSlVrJC_Iq50c-k5rDS3Iha1kLJetLku7DiGlIONp9j_R7wta70cdAY0e3N5sta-IPVtC7wY9xQLYd0PnOO7prNnSDLh6Cf8Ifsg8HapcGtkvWB2znPY7oA13bcJjsAenX2GL_mrzobJ_xzXlekYdPd7vmC1t_-3zf3KyZk1COrFZKFbaqoLVKu73lgJ1zIMW-BiVKCRaV0kJ1nXRtK2pEp1XZgW6dk64GeUU-nHqHFH9OmEdz9Nlh39uAccqm0JJDBeIJvf4PfYxTCvN3RnIFlSi5qmZKnCiXYs4JOzMkf7TplxHcLELMIsQsQsxZyJx5f26e9kds_yb-GJiBdyfAI-I_hQDz5UL-BoHUjg4</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Yoo, Sunyong</creator><creator>Jeong, Myeonghyeon</creator><creator>Seomun, Subhin</creator><creator>Kim, Kiseong</creator><creator>Han, Youngmahn</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0009-0005-4299-4765</orcidid><orcidid>https://orcid.org/0000-0001-7394-3673</orcidid><orcidid>https://orcid.org/0009-0006-5271-3549</orcidid><orcidid>https://orcid.org/0000-0003-0925-1853</orcidid><orcidid>https://orcid.org/0000-0001-9935-3516</orcidid></search><sort><creationdate>20240501</creationdate><title>Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model</title><author>Yoo, Sunyong ; Jeong, Myeonghyeon ; Seomun, Subhin ; Kim, Kiseong ; Han, Youngmahn</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c345t-86662a774da69cba04efcc431b8461534ae66916ff3cdd18eec965f49dcc3c843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Amino acids</topic><topic>Attention mechanism</topic><topic>Attention mechanisms</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>Deep learning</topic><topic>Disease transmission</topic><topic>epitope</topic><topic>Epitopes</topic><topic>Gaussian process</topic><topic>Humoral immunity</topic><topic>Immune response</topic><topic>Immune response (cell-mediated)</topic><topic>Immune system</topic><topic>Immunogenicity</topic><topic>Lymphocytes</topic><topic>Lymphocytes T</topic><topic>Machine learning</topic><topic>Performance prediction</topic><topic>Prediction models</topic><topic>Predictive models</topic><topic>Proteins</topic><topic>Public health</topic><topic>Recognition</topic><topic>SARS-CoV-2</topic><topic>Severe acute respiratory syndrome coronavirus 2</topic><topic>T cell receptors</topic><topic>T-cell receptor</topic><topic>Transfer learning</topic><topic>Viral diseases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yoo, Sunyong</creatorcontrib><creatorcontrib>Jeong, Myeonghyeon</creatorcontrib><creatorcontrib>Seomun, Subhin</creatorcontrib><creatorcontrib>Kim, Kiseong</creatorcontrib><creatorcontrib>Han, Youngmahn</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yoo, Sunyong</au><au>Jeong, Myeonghyeon</au><au>Seomun, Subhin</au><au>Kim, Kiseong</au><au>Han, Youngmahn</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2024-05-01</date><risdate>2024</risdate><volume>21</volume><issue>3</issue><spage>428</spage><epage>438</epage><pages>428-438</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>The emergence of the novel coronavirus, designated as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has posed a significant threat to public health worldwide. There has been progress in reducing hospitalizations and deaths due to SARS-CoV-2. However, challenges stem from the emergence of SARS-CoV-2 variants, which exhibit high transmission rates, increased disease severity, and the ability to evade humoral immunity. Epitope-specific T-cell receptor (TCR) recognition is key in determining the T-cell immunogenicity for SARS-CoV-2 epitopes. Although several data-driven methods for predicting epitope-specific TCR recognition have been proposed, they remain challenging due to the enormous diversity of TCRs and the lack of available training data. Self-supervised transfer learning has recently been proven useful for extracting information from unlabeled protein sequences, increasing the predictive performance of fine-tuned models, and using a relatively small amount of training data. This study presents a deep-learning model generated by fine-tuning pre-trained protein embeddings from a large corpus of protein sequences. The fine-tuned model showed markedly high predictive performance and outperformed the recent Gaussian process-based prediction model. The output attentions captured by the deep-learning model suggested critical amino acid positions in the SARS-CoV-2 epitope-specific TCRβ sequences that are highly associated with the viral escape of T-cell immune response.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38381638</pmid><doi>10.1109/TCBB.2024.3368046</doi><tpages>11</tpages><orcidid>https://orcid.org/0009-0005-4299-4765</orcidid><orcidid>https://orcid.org/0000-0001-7394-3673</orcidid><orcidid>https://orcid.org/0009-0006-5271-3549</orcidid><orcidid>https://orcid.org/0000-0003-0925-1853</orcidid><orcidid>https://orcid.org/0000-0001-9935-3516</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1545-5963
ispartof IEEE/ACM transactions on computational biology and bioinformatics, 2024-05, Vol.21 (3), p.428-438
issn 1545-5963
1557-9964
language eng
recordid cdi_proquest_journals_3064715067
source Access via ACM Digital Library; IEEE Electronic Library (IEL)
subjects Amino acids
Attention mechanism
Attention mechanisms
Coronaviruses
COVID-19
Deep learning
Disease transmission
epitope
Epitopes
Gaussian process
Humoral immunity
Immune response
Immune response (cell-mediated)
Immune system
Immunogenicity
Lymphocytes
Lymphocytes T
Machine learning
Performance prediction
Prediction models
Predictive models
Proteins
Public health
Recognition
SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2
T cell receptors
T-cell receptor
Transfer learning
Viral diseases
title Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T15%3A36%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interpretable%20Prediction%20of%20SARS-CoV-2%20Epitope-Specific%20TCR%20Recognition%20Using%20a%20Pre-Trained%20Protein%20Language%20Model&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Yoo,%20Sunyong&rft.date=2024-05-01&rft.volume=21&rft.issue=3&rft.spage=428&rft.epage=438&rft.pages=428-438&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2024.3368046&rft_dat=%3Cproquest_cross%3E2930474184%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064715067&rft_id=info:pmid/38381638&rft_ieee_id=10443062&rfr_iscdi=true