Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets

•Performance of various decision tree classifiers is compared in predicting membrane protein types.•Performance of decision tree classifiers is also compared with SVM classifier and Naive Bayes classifier.•Among the various decision tree classifiers Random forest performs well in less time with good...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of theoretical biology 2017-12, Vol.435, p.208-217
Hauptverfasser: Sankari, E. Siva, Manimegalai, D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Performance of various decision tree classifiers is compared in predicting membrane protein types.•Performance of decision tree classifiers is also compared with SVM classifier and Naive Bayes classifier.•Among the various decision tree classifiers Random forest performs well in less time with good accuracy.•RUS boost decision tree and Random tree classifiers are able to classify one or two samples in the class with very less samples. Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier.
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2017.09.018