Method for predicting various lysine modification sites based on ClusterCentroids under-sampling technology
The invention belongs to the field of artificial intelligence algorithm application-biological sequence recognition, and relates to a method for predicting various lysine modification sites based on a ClusterCentroids undersampling technology. Firstly, through data collection, integration, redundanc...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention belongs to the field of artificial intelligence algorithm application-biological sequence recognition, and relates to a method for predicting various lysine modification sites based on a ClusterCentroids undersampling technology. Firstly, through data collection, integration, redundancy elimination, feature space optimization and redundant information reduction, a batch of protein sequences with the significant class imbalance problem are obtained as input data. Thirdly, performing feature coding on the protein sequence by using a multi-label specific position triple amino acid tendency feature extraction algorithm to obtain an input feature matrix; and then, a ClusterCentroids framework is adopted to be assisted by a MinibatchKmeans algorithm to calculate clustering centers of a majority of classes so as to process the unbalanced data set, and it is ensured that the model can have a good prediction effect on prediction of various modification sites. According to the invention, prediction of a p |
---|