Pre-training with a rational approach for antibody sequence representation

Antibodies represent a specific class of proteins produced by the adaptive immune system in response to pathogens. Mining the information embedded in antibody amino acid sequences can benefit both antibody property prediction and novel therapeutic development. However, antibodies possess unique feat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Frontiers in immunology 2024-10, Vol.15, p.1468599
Hauptverfasser: Gao, Xiangrui, Cao, Changling, He, Chenfeng, Lai, Lipeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Antibodies represent a specific class of proteins produced by the adaptive immune system in response to pathogens. Mining the information embedded in antibody amino acid sequences can benefit both antibody property prediction and novel therapeutic development. However, antibodies possess unique features that should be incorporated using specifically designed training methods, leaving room for improvement in pre-training models for antibody sequences. In this study, we present a Pre-trained model of Antibody sequences trained with a Rational Approach for antibodies (PARA). PARA employs a strategy conforming to antibody sequence patterns and an advanced natural language processing self-encoding model structure. This approach addresses the limitations of existing protein pre-training models, which primarily utilize language models without fully considering the differences between protein sequences and language sequences. We demonstrate PARA's performance on several tasks by comparing it to various published pre-training models of antibodies. The results show that PARA significantly outperforms existing models on these tasks, suggesting that PARA has an advantage in capturing antibody sequence information. The antibody latent representation provided by PARA can substantially facilitate studies in relevant areas. We believe that PARA's superior performance in capturing antibody sequence information offers significant potential for both antibody property prediction and the development of novel therapeutics. PARA is available at https://github.com/xtalpi-xic.
ISSN:1664-3224
1664-3224
DOI:10.3389/fimmu.2024.1468599