Identification of DNA-binding proteins by Kernel Sparse Representation via L2,1-matrix norm

An understanding of DNA-binding proteins is helpful in exploring the role that proteins play in cell biology. Furthermore, the prediction of DNA-binding proteins is essential for the chemical modification and structural composition of DNA, and is of great importance in protein functional analysis an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers in biology and medicine 2023-06, Vol.159, p.106849-106849, Article 106849
Hauptverfasser: Ming, Yutong, Liu, Hongzhi, Cui, Yizhi, Guo, Shaoyong, Ding, Yijie, Liu, Ruijun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An understanding of DNA-binding proteins is helpful in exploring the role that proteins play in cell biology. Furthermore, the prediction of DNA-binding proteins is essential for the chemical modification and structural composition of DNA, and is of great importance in protein functional analysis and drug design. In recent years, DNA-binding protein prediction has typically used machine learning-based methods. The prediction accuracy of various classifiers has improved considerably, but researchers continue to spend time and effort on improving prediction performance. In this paper, we combine protein sequence evolutionary information with a classification method based on kernel sparse representation for the prediction of DNA-binding proteins, and based on the field of machine learning, a model for the identification of DNA-binding proteins by sequence information was finally proposed. Based on the confirmation of the final experimental results, we achieved good prediction accuracy on both the PDB1075 and PDB186 datasets. Our training result for cross-validation on PDB1075 was 81.37%, and our independent test result on PDB186 was 83.9%, both of which outperformed the other methods to some extent. Therefore, the proposed method in this paper is proven to be effective and feasible for predicting DNA-binding proteins. •One advantage of KSRC is that it can efficiently learn high-dimensional features of protein sequences without being affected by dimensional. In addition, KSRC can give a small weight to noisy or redundant data without affecting the computation of the model, and has high stability and interpretability. Therefore, we choose KSRC as a DNA-binding protein recognition classifier, and its high accuracy and high dimensional feature processing capability can significantly improve the classification accuracy of DNA-binding proteins.•KSRC with the L1-norm is very time-consuming because for all data in the testing set, the corresponding correlation representation factors need to be found individually. Consequently, we compute the solution of the sparse representation by means of the L2,1-norm matrix terms.•The protein evolutionary information extracted by PSSM can improve the accuracy of protein description and the predictive performance of DBP. The protein evolution information extracted by PSSM can improve the accuracy of protein description and the predictive performance of DBP. We use different feature extraction methods to extract features representi
ISSN:0010-4825
1879-0534
DOI:10.1016/j.compbiomed.2023.106849