Machine learning‐based approach for prediction of ion channels and their subclasses
Ion channels are ion‐permeable protein pores that are found in all cell lipid membranes. Distinct ion channels play multiple roles in biological processes. Proteomic data is fast accumulating as a result of the fast growth of mass spectrometry and giving us the chance to comprehensively explore ion...
Gespeichert in:
Veröffentlicht in: | Journal of cellular biochemistry 2023-01, Vol.124 (1), p.72-88 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Ion channels are ion‐permeable protein pores that are found in all cell lipid membranes. Distinct ion channels play multiple roles in biological processes. Proteomic data is fast accumulating as a result of the fast growth of mass spectrometry and giving us the chance to comprehensively explore ion channel classes along with their subclasses. This paper proposes an eXtreme Gradient Boosting (XGBoost)‐based method to estimate the ion channel classes and their subclasses. Here, 12 feature vectors are applied to better characterize protein sequences like amino acid composition, pseudo‐amino acid composition, normalized moreau‐broto autocorrelation, amphiphilic pseudo‐amino acid composition, dipeptide composition, Geary autocorrelation, tripeptide composition, sequence‐order‐coupling number, composition/transition/distribution, conjoint triad, moran autocorrelation, quasi‐sequence‐order descriptors. Here, a total of 9920 features are extracted from the protein sequence. The principal component analysis is applied to determine the optimal number of features to optimize the performance. In 10‐fold cross‐validation the proposed XGBoost based approach with optimal 50 features achieved accuracy of 100%, 98.70%, 98.77%, 97.26%, 87.40%, 97.39%, 98.03%, 96.42%, and F1‐Score of 100%, 99%, 99%, 97%, 87%, 97%, 98%, 97%, for prediction of ion channel and nonion channel, voltage‐gated and ligand‐gated ion channels, subclasses of voltage‐gated ion channels (VGICs), subclasses of ligand‐gated ion channels (LGICs), subclasses of voltage‐gated calcium channels (VGCCs), subclasses of voltage‐gated potassium channels (VGKCs), subclasses of voltage‐gated sodium channels (VGSCs), and subclasses of voltage‐gated chloride channels, respectively. Here the proposed approach also compares with the other approaches such as support vector machine, k‐nearest neighbor, Gaussian Naïve Bayes, and random forest and also compares with existing methods such as support vector machine (SVM) with maximum relevance maximum distance with an accuracy of 86.6%, 83.7%, and 85.1%, for ion channels, non‐ion channels and overall respectively and SVM with radial basis function kernel‐based method with an accuracy of 100%, 97% and 99.9% for ion channels, nonion channels, and overall accuracy, respectively. |
---|---|
ISSN: | 0730-2312 1097-4644 |
DOI: | 10.1002/jcb.30343 |