PseAAC2Vec protein encoding for TCR protein sequence classification

The classification and prediction of T-cell receptors (TCRs) protein sequences are of significant interest in understanding the immune system and developing personalized immunotherapies. In this study, we propose a novel approach using Pseudo Amino Acid Composition (PseAAC) protein encoding for accu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers in biology and medicine 2024-03, Vol.170, p.107956-107956, Article 107956
Hauptverfasser:	Tayebi, Zahra, Ali, Sarwan, Murad, Taslim, Khan, Imdadullah, Patterson, Murray
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptability Algorithms Amino acid composition Amino Acid Sequence Amino acids Amino Acids - chemistry Amino Acids - metabolism Classification Coding Computational Biology - methods Databases, Protein Datasets Hydrophobicity Immune system Immunotherapy Lymphocytes T Machine learning Molecular weight Physicochemical properties Protein composition Protein sequences Proteins Proteins - chemistry Sequence Analysis, Protein - methods Sequences Support Vector Machine Support vector machines T cell receptors TCR
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The classification and prediction of T-cell receptors (TCRs) protein sequences are of significant interest in understanding the immune system and developing personalized immunotherapies. In this study, we propose a novel approach using Pseudo Amino Acid Composition (PseAAC) protein encoding for accurate TCR protein sequence classification. The PseAAC2Vec encoding method captures the physicochemical properties of amino acids and their local sequence information, enabling the representation of protein sequences as fixed-length feature vectors. By incorporating physicochemical properties such as hydrophobicity, polarity, charge, molecular weight, and solvent accessibility, PseAAC2Vec provides a comprehensive and informative characterization of TCR protein sequences. To evaluate the effectiveness of the proposed PseAAC2Vec encoding approach, we assembled a large dataset of TCR protein sequences with annotated classes. We applied the PseAAC2Vec encoding scheme to each sequence and generated feature vectors based on a specified window size. Subsequently, we employed state-of-the-art machine learning algorithms, such as support vector machines (SVM) and random forests (RF), to classify the TCR protein sequences. Experimental results on the benchmark dataset demonstrated the superior performance of the PseAAC2Vec-based approach compared to existing methods. The PseAAC2Vec encoding effectively captures the discriminative patterns in TCR protein sequences, leading to improved classification accuracy and robustness. Furthermore, the encoding scheme showed promising results across different window sizes, indicating its adaptability to varying sequence contexts. •The classification and prediction of T-cell receptors (TCRs) protein sequences are essential for understanding the immune system and developing personalized immunotherapies.•However, applying machine learning algorithms to protein sequences could be challenging, as these algorithms typically require numerical embeddings.•We propose a novel approach using Pseudo Amino Acid Composition (PseAAC) protein encoding for accurate TCR protein sequence classification.•The PseAAC2Vec encoding method captures the physicochemical properties of amino acids and their local sequence information, enabling the representation of protein sequences as fixed-length feature vectors.
ISSN:	0010-4825 1879-0534
DOI:	10.1016/j.compbiomed.2024.107956