Structural Domain Based Multiple Instance Learning for Predicting Gram-Positive Bacterial Protein Subcellular Localization
Until recently, far few researches have been reported on Gram-positive protein subcelluar location prediction. Novel computational method is highly needed to help biologist design experiment. In this paper, we are motivated to propose a novel machine learning model for predicting Gram-positive prote...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Until recently, far few researches have been reported on Gram-positive protein subcelluar location prediction. Novel computational method is highly needed to help biologist design experiment. In this paper, we are motivated to propose a novel machine learning model for predicting Gram-positive protein subcelluar localization, as an alternative to the existing models Gpos-PLoc when the required GO annotation information is unavailable. The model uses protein structural domain as indicator of protein subcelluar location. To capture protein sequence local information and structural domain boundary partition information, a novel method called multiple instance multiclass learning (MIMC) is proposed for predicting protein subcelluar location, where domain is taken as an instance of protein and protein as a bag of domains. Because some proteins may have multiple subcelluar locations, we introduce another related model called multiple instance multiple label learning (MIML) to predict potential minor subcelluar locations. Protein sequence and domain are encoded using simple 20-D amino acid composition (AA), so that feature dimensionality is greatly reduced and the instance representation can capture domain boundary partition information as compared to flat domain vector representation. Experiments show that simple AA representation outperforms order-based Pseudo Amino Acid (PseAA) representation, and MIMC model performs comparably to Choupsilas OET-NN ensemble (Gpos-PLoc),the only machine learning model for Gram-positive protein subcelluar location prediction thus far, to the best of our knowledge. |
---|---|
DOI: | 10.1109/IJCBS.2009.14 |