Predicting membrane protein types by incorporating a novel feature set into Chou's general PseAAC

•Novel feature set to predict membrane protein types is proposed.•Performance of decision tree classifiers such as Decision tree, CART, Adaboost, RUS boost, Rotation forest and Random forest are compared in predicting membrane protein types.•Novel feature set with less number of features than existi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of theoretical biology 2018-10, Vol.455, p.319-328
Hauptverfasser: Sankari, E. Siva, Manimegalai, D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Novel feature set to predict membrane protein types is proposed.•Performance of decision tree classifiers such as Decision tree, CART, Adaboost, RUS boost, Rotation forest and Random forest are compared in predicting membrane protein types.•Novel feature set with less number of features than existing feature set performs equivalent to the existing feature set.•New dataset is used.•Random forest and Adaboost performs well. Membrane proteins are vital type of proteins that serve as channels, receptors and energy transducers in a cell. They perform various important functions, which are mainly associated with their types. They are also attractive targets of drug discovery for various diseases. So predicting membrane protein types is a crucial and challenging research area in bioinformatics and proteomics. Because of vast investigation of uncharacterized protein sequences in databases, customary biophysical techniques are extremely tedious, costly and vulnerable to mistakes. Subsequently, it is very attractive to build a vigorous, solid, proficient technique to predict membrane protein types. In this work, a novel feature set Exchange Group Based Protein Sequence Representation (EGBPSR) is proposed for classification of membrane proteins with two new feature extraction strategies known as Exchange Group Local Pattern (EGLP) and Amino acid Interval Pattern (AIP). Imbalanced dataset and large dataset are often handled well by decision tree classifiers. Since imbalanced dataset are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification and Regression Tree (CART), ensemble methods such as Adaboost, Random Under Sampling (RUS) boost, Rotation forest and Random forest are analyzed. The overall accuracy achieved in predicting membrane protein types is 96.45%.
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2018.07.032