Feature selection and granular SVM classification for protein arginine methylation identification

Protein methylation modification has been discovered for half a century but still far less been studied than other modifications. Computational analysis is recently introduced to discover other unknown methylation sites based on few known ones. To effectively predict possible methylation, sophistica...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zejin Ding, Yan-Qing Zhang, Zheng, Y.G.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Protein methylation modification has been discovered for half a century but still far less been studied than other modifications. Computational analysis is recently introduced to discover other unknown methylation sites based on few known ones. To effectively predict possible methylation, sophisticated classification strategy should be well devised. In this paper, we first extracted informative features from methylated fragments in many protein sequences, including the physicochemical properties, secondary structure information, evolutionary profiles, and solvent accessibility of surrounding residues. Then, an efficient feature selection method (mRMR) is applied to eliminate redundant features but keep important ones. Since methylated residues are far less than non-methylated, the collected data is relatively imbalanced. Thus, we propose to use the granular support vector machine (GSVM) which is specially designed for imbalanced classification problems. A 7-fold cross validation shows that our strategy generates comparable predication accuracy with many current methods or even better. Meanwhile, our method provides insights to identify the underlying mechanisms of protein methylation.
ISSN:1062-922X
2577-1655
DOI:10.1109/ICSMC.2009.5345973