Predicting essential genes of 41 prokaryotes by a semi-supervised method
Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result...
Gespeichert in:
Veröffentlicht in: | Analytical biochemistry 2020-11, Vol.609, p.113919-113919, Article 113919 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
[Display omitted]
•The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality. |
---|---|
ISSN: | 0003-2697 1096-0309 |
DOI: | 10.1016/j.ab.2020.113919 |