Method for predicting various lysine modification sites based on ClusterCentroids under-sampling technology

The invention belongs to the field of artificial intelligence algorithm application-biological sequence recognition, and relates to a method for predicting various lysine modification sites based on a ClusterCentroids undersampling technology. Firstly, through data collection, integration, redundanc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: DENG ZHAOHONG, WAN MINQUAN, ZHANG BANGYI, ZUO YUN, FANG XINGZE
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention belongs to the field of artificial intelligence algorithm application-biological sequence recognition, and relates to a method for predicting various lysine modification sites based on a ClusterCentroids undersampling technology. Firstly, through data collection, integration, redundancy elimination, feature space optimization and redundant information reduction, a batch of protein sequences with the significant class imbalance problem are obtained as input data. Thirdly, performing feature coding on the protein sequence by using a multi-label specific position triple amino acid tendency feature extraction algorithm to obtain an input feature matrix; and then, a ClusterCentroids framework is adopted to be assisted by a MinibatchKmeans algorithm to calculate clustering centers of a majority of classes so as to process the unbalanced data set, and it is ensured that the model can have a good prediction effect on prediction of various modification sites. According to the invention, prediction of a p