Structural Domain Based Multiple Instance Learning for Predicting Gram-Positive Bacterial Protein Subcellular Localization

Until recently, far few researches have been reported on Gram-positive protein subcelluar location prediction. Novel computational method is highly needed to help biologist design experiment. In this paper, we are motivated to propose a novel machine learning model for predicting Gram-positive prote...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mei, S.Y., Wang Fei
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Until recently, far few researches have been reported on Gram-positive protein subcelluar location prediction. Novel computational method is highly needed to help biologist design experiment. In this paper, we are motivated to propose a novel machine learning model for predicting Gram-positive protein subcelluar localization, as an alternative to the existing models Gpos-PLoc when the required GO annotation information is unavailable. The model uses protein structural domain as indicator of protein subcelluar location. To capture protein sequence local information and structural domain boundary partition information, a novel method called multiple instance multiclass learning (MIMC) is proposed for predicting protein subcelluar location, where domain is taken as an instance of protein and protein as a bag of domains. Because some proteins may have multiple subcelluar locations, we introduce another related model called multiple instance multiple label learning (MIML) to predict potential minor subcelluar locations. Protein sequence and domain are encoded using simple 20-D amino acid composition (AA), so that feature dimensionality is greatly reduced and the instance representation can capture domain boundary partition information as compared to flat domain vector representation. Experiments show that simple AA representation outperforms order-based Pseudo Amino Acid (PseAA) representation, and MIMC model performs comparably to Choupsilas OET-NN ensemble (Gpos-PLoc),the only machine learning model for Gram-positive protein subcelluar location prediction thus far, to the best of our knowledge.
DOI:10.1109/IJCBS.2009.14