HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction
Abstract Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destr...
Gespeichert in:
Veröffentlicht in: | Briefings in bioinformatics 2022-09, Vol.23 (5) |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 5 |
container_start_page | |
container_title | Briefings in bioinformatics |
container_volume | 23 |
creator | Zhang, Yaqi Zhu, Gancheng Li, Kewei Li, Fei Huang, Lan Duan, Meiyu Zhou, Fengfeng |
description | Abstract
Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http://www.healthinformaticslab.org/supp/resources.php. |
doi_str_mv | 10.1093/bib/bbac173 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9487590</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbac173</oup_id><sourcerecordid>2717367643</sourcerecordid><originalsourceid>FETCH-LOGICAL-c417t-b54e1265b50400745a4ae3adc4edab7cd8cc1e3c612ce230db432c9d2b75b2863</originalsourceid><addsrcrecordid>eNp9kV1rFDEUhgdR7Ide-QcCgghlbL4z64XQLWoLWxSs1yEfZ9uU2WSaZAT_vdnuIrQXvTrhvA8PSd6ue0fwJ4IX7NQGe2qtcUSxF90h4Ur1HAv-cnuWqhdcsoPuqJQ7jClWA3ndHTAhCCcDO-zqxeps-RmNYHIM8QbVW0DLsPp1fYXWYOqcoaB1TpuH4GdOdQm59hBd8uDR1BYQYkNSfiDcaEpBl6hZ-wmmGjwgG6LfqqcMPrgaUnzTvVqbscDb_Tzufn_7en1-0a9-fL88P1v1jhNVeys4ECqFFZhjrLgw3AAz3nHwxirnB-cIMCcJdUAZ9pYz6haeWiUsHSQ77r7svNNsN-AdxJrNqKccNib_1ckE_TiJ4VbfpD96wQclFrgJPu4FOd3PUKrehOJgHE2ENBdNpSR4kO2fG_r-CXqX5hzb8zRVrRqpJGeNOtlRLqdSMqz_X4ZgvW1Ttzb1vs1Gf9jRaZ6eBf8BWm-fmw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2717367643</pqid></control><display><type>article</type><title>HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction</title><source>Access via Oxford University Press (Open Access Collection)</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Business Source Complete</source><source>PubMed Central</source><creator>Zhang, Yaqi ; Zhu, Gancheng ; Li, Kewei ; Li, Fei ; Huang, Lan ; Duan, Meiyu ; Zhou, Fengfeng</creator><creatorcontrib>Zhang, Yaqi ; Zhu, Gancheng ; Li, Kewei ; Li, Fei ; Huang, Lan ; Duan, Meiyu ; Zhou, Fengfeng</creatorcontrib><description>Abstract
Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http://www.healthinformaticslab.org/supp/resources.php.</description><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbac173</identifier><identifier>PMID: 35514183</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Algorithms ; Amino acids ; Antigens ; Artificial neural networks ; Binding ; Cancer ; Cancer immunotherapy ; Coders ; Computer applications ; Feature extraction ; Histocompatibility antigen HLA ; Immune system ; Invasiveness ; Leukocytes ; Long short-term memory ; Lymphocytes ; Lymphocytes T ; Machine learning ; Natural language processing ; Neural networks ; Peptides ; Polypeptides ; Predictions ; Problem Solving Protocol ; Proteins ; Source code</subject><ispartof>Briefings in bioinformatics, 2022-09, Vol.23 (5)</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved. 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c417t-b54e1265b50400745a4ae3adc4edab7cd8cc1e3c612ce230db432c9d2b75b2863</citedby><cites>FETCH-LOGICAL-c417t-b54e1265b50400745a4ae3adc4edab7cd8cc1e3c612ce230db432c9d2b75b2863</cites><orcidid>0000-0001-7171-2695 ; 0000-0002-8108-6007</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487590/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487590/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,886,1605,27929,27930,53796,53798</link.rule.ids></links><search><creatorcontrib>Zhang, Yaqi</creatorcontrib><creatorcontrib>Zhu, Gancheng</creatorcontrib><creatorcontrib>Li, Kewei</creatorcontrib><creatorcontrib>Li, Fei</creatorcontrib><creatorcontrib>Huang, Lan</creatorcontrib><creatorcontrib>Duan, Meiyu</creatorcontrib><creatorcontrib>Zhou, Fengfeng</creatorcontrib><title>HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction</title><title>Briefings in bioinformatics</title><description>Abstract
Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http://www.healthinformaticslab.org/supp/resources.php.</description><subject>Algorithms</subject><subject>Amino acids</subject><subject>Antigens</subject><subject>Artificial neural networks</subject><subject>Binding</subject><subject>Cancer</subject><subject>Cancer immunotherapy</subject><subject>Coders</subject><subject>Computer applications</subject><subject>Feature extraction</subject><subject>Histocompatibility antigen HLA</subject><subject>Immune system</subject><subject>Invasiveness</subject><subject>Leukocytes</subject><subject>Long short-term memory</subject><subject>Lymphocytes</subject><subject>Lymphocytes T</subject><subject>Machine learning</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Peptides</subject><subject>Polypeptides</subject><subject>Predictions</subject><subject>Problem Solving Protocol</subject><subject>Proteins</subject><subject>Source code</subject><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><recordid>eNp9kV1rFDEUhgdR7Ide-QcCgghlbL4z64XQLWoLWxSs1yEfZ9uU2WSaZAT_vdnuIrQXvTrhvA8PSd6ue0fwJ4IX7NQGe2qtcUSxF90h4Ur1HAv-cnuWqhdcsoPuqJQ7jClWA3ndHTAhCCcDO-zqxeps-RmNYHIM8QbVW0DLsPp1fYXWYOqcoaB1TpuH4GdOdQm59hBd8uDR1BYQYkNSfiDcaEpBl6hZ-wmmGjwgG6LfqqcMPrgaUnzTvVqbscDb_Tzufn_7en1-0a9-fL88P1v1jhNVeys4ECqFFZhjrLgw3AAz3nHwxirnB-cIMCcJdUAZ9pYz6haeWiUsHSQ77r7svNNsN-AdxJrNqKccNib_1ckE_TiJ4VbfpD96wQclFrgJPu4FOd3PUKrehOJgHE2ENBdNpSR4kO2fG_r-CXqX5hzb8zRVrRqpJGeNOtlRLqdSMqz_X4ZgvW1Ttzb1vs1Gf9jRaZ6eBf8BWm-fmw</recordid><startdate>20220920</startdate><enddate>20220920</enddate><creator>Zhang, Yaqi</creator><creator>Zhu, Gancheng</creator><creator>Li, Kewei</creator><creator>Li, Fei</creator><creator>Huang, Lan</creator><creator>Duan, Meiyu</creator><creator>Zhou, Fengfeng</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>TOX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0001-7171-2695</orcidid><orcidid>https://orcid.org/0000-0002-8108-6007</orcidid></search><sort><creationdate>20220920</creationdate><title>HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction</title><author>Zhang, Yaqi ; Zhu, Gancheng ; Li, Kewei ; Li, Fei ; Huang, Lan ; Duan, Meiyu ; Zhou, Fengfeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c417t-b54e1265b50400745a4ae3adc4edab7cd8cc1e3c612ce230db432c9d2b75b2863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Amino acids</topic><topic>Antigens</topic><topic>Artificial neural networks</topic><topic>Binding</topic><topic>Cancer</topic><topic>Cancer immunotherapy</topic><topic>Coders</topic><topic>Computer applications</topic><topic>Feature extraction</topic><topic>Histocompatibility antigen HLA</topic><topic>Immune system</topic><topic>Invasiveness</topic><topic>Leukocytes</topic><topic>Long short-term memory</topic><topic>Lymphocytes</topic><topic>Lymphocytes T</topic><topic>Machine learning</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Peptides</topic><topic>Polypeptides</topic><topic>Predictions</topic><topic>Problem Solving Protocol</topic><topic>Proteins</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yaqi</creatorcontrib><creatorcontrib>Zhu, Gancheng</creatorcontrib><creatorcontrib>Li, Kewei</creatorcontrib><creatorcontrib>Li, Fei</creatorcontrib><creatorcontrib>Huang, Lan</creatorcontrib><creatorcontrib>Duan, Meiyu</creatorcontrib><creatorcontrib>Zhou, Fengfeng</creatorcontrib><collection>Access via Oxford University Press (Open Access Collection)</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yaqi</au><au>Zhu, Gancheng</au><au>Li, Kewei</au><au>Li, Fei</au><au>Huang, Lan</au><au>Duan, Meiyu</au><au>Zhou, Fengfeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction</atitle><jtitle>Briefings in bioinformatics</jtitle><date>2022-09-20</date><risdate>2022</risdate><volume>23</volume><issue>5</issue><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Abstract
Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http://www.healthinformaticslab.org/supp/resources.php.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>35514183</pmid><doi>10.1093/bib/bbac173</doi><orcidid>https://orcid.org/0000-0001-7171-2695</orcidid><orcidid>https://orcid.org/0000-0002-8108-6007</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1467-5463 |
ispartof | Briefings in bioinformatics, 2022-09, Vol.23 (5) |
issn | 1467-5463 1477-4054 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9487590 |
source | Access via Oxford University Press (Open Access Collection); Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Business Source Complete; PubMed Central |
subjects | Algorithms Amino acids Antigens Artificial neural networks Binding Cancer Cancer immunotherapy Coders Computer applications Feature extraction Histocompatibility antigen HLA Immune system Invasiveness Leukocytes Long short-term memory Lymphocytes Lymphocytes T Machine learning Natural language processing Neural networks Peptides Polypeptides Predictions Problem Solving Protocol Proteins Source code |
title | HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T19%3A06%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HLAB:%20learning%20the%20BiLSTM%20features%20from%20the%20ProtBert-encoded%20proteins%20for%20the%20class%20I%20HLA-peptide%20binding%20prediction&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Zhang,%20Yaqi&rft.date=2022-09-20&rft.volume=23&rft.issue=5&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbac173&rft_dat=%3Cproquest_pubme%3E2717367643%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2717367643&rft_id=info:pmid/35514183&rft_oup_id=10.1093/bib/bbac173&rfr_iscdi=true |