PseAAC2Vec protein encoding for TCR protein sequence classification
The classification and prediction of T-cell receptors (TCRs) protein sequences are of significant interest in understanding the immune system and developing personalized immunotherapies. In this study, we propose a novel approach using Pseudo Amino Acid Composition (PseAAC) protein encoding for accu...
Gespeichert in:
Veröffentlicht in: | Computers in biology and medicine 2024-03, Vol.170, p.107956-107956, Article 107956 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The classification and prediction of T-cell receptors (TCRs) protein sequences are of significant interest in understanding the immune system and developing personalized immunotherapies. In this study, we propose a novel approach using Pseudo Amino Acid Composition (PseAAC) protein encoding for accurate TCR protein sequence classification. The PseAAC2Vec encoding method captures the physicochemical properties of amino acids and their local sequence information, enabling the representation of protein sequences as fixed-length feature vectors. By incorporating physicochemical properties such as hydrophobicity, polarity, charge, molecular weight, and solvent accessibility, PseAAC2Vec provides a comprehensive and informative characterization of TCR protein sequences. To evaluate the effectiveness of the proposed PseAAC2Vec encoding approach, we assembled a large dataset of TCR protein sequences with annotated classes. We applied the PseAAC2Vec encoding scheme to each sequence and generated feature vectors based on a specified window size. Subsequently, we employed state-of-the-art machine learning algorithms, such as support vector machines (SVM) and random forests (RF), to classify the TCR protein sequences. Experimental results on the benchmark dataset demonstrated the superior performance of the PseAAC2Vec-based approach compared to existing methods. The PseAAC2Vec encoding effectively captures the discriminative patterns in TCR protein sequences, leading to improved classification accuracy and robustness. Furthermore, the encoding scheme showed promising results across different window sizes, indicating its adaptability to varying sequence contexts.
•The classification and prediction of T-cell receptors (TCRs) protein sequences are essential for understanding the immune system and developing personalized immunotherapies.•However, applying machine learning algorithms to protein sequences could be challenging, as these algorithms typically require numerical embeddings.•We propose a novel approach using Pseudo Amino Acid Composition (PseAAC) protein encoding for accurate TCR protein sequence classification.•The PseAAC2Vec encoding method captures the physicochemical properties of amino acids and their local sequence information, enabling the representation of protein sequences as fixed-length feature vectors. |
---|---|
ISSN: | 0010-4825 1879-0534 |
DOI: | 10.1016/j.compbiomed.2024.107956 |