Prediction of aptamer–protein interacting pairs based on sparse autoencoder feature extraction and an ensemble classifier

•Aptamer–protein interacting pairs play important roles in physiological functions and structural characterization. Identifying aptamer–protein interacting pairs is challenging and limited, despite the tremendous applications of aptamers.•A sparse autoencoder was used to characterize features for th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematical biosciences 2019-05, Vol.311, p.103-108
Hauptverfasser: Yang, Qing, Jia, Cangzhi, Li, Taoying
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Aptamer–protein interacting pairs play important roles in physiological functions and structural characterization. Identifying aptamer–protein interacting pairs is challenging and limited, despite the tremendous applications of aptamers.•A sparse autoencoder was used to characterize features for the target protein sequences.•Gradient boosting decision tree and incremental feature selection methods were used to obtain the optimal combination of features. Aptamer–protein interacting pairs play important roles in physiological functions and structural characterization. Identifying aptamer–protein interacting pairs is challenging and limited, despite of the tremendous applications of aptamers. Therefore, it is vital to construct a high prediction performance model for identifying aptamer–target interacting pairs. In this study, a novel ensemble method is presented to predict aptamer–protein interacting pairs by integrating sequence characteristics derived from aptamers and the target proteins. The features extracted for aptamers were the compositions of amino acids and pseudo K-tuple nucleotides. In addition, a sparse autoencoder was used to characterize features for the target protein sequences. To remove redundant features, gradient boosting decision tree (GBDT) and incremental feature selection (IFS) methods were used to obtain the optimum combination of sequence characters. Based on 616 selected features, an ensemble of three sub- support vector machine (SVM) classifiers was used to construct our prediction model. Evaluated on an independent dataset, our predictor obtained an accuracy of 75.7%, Matthew's Correlation Coefficient of 0.478, and Youden's Index of 0.538, which were superior to the values reached using other existing predictors. The results show that our model can be used to distinguishing novel aptamer–protein interacting pairs and revealing the interrelation between aptamers and proteins.
ISSN:0025-5564
1879-3134
DOI:10.1016/j.mbs.2019.01.009