Identifying essential genes across eukaryotes by machine learning

Abstract Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essenti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:NAR Genomics and Bioinformatics 2021-12, Vol.3 (4), p.lqab110-lqab110
Hauptverfasser: Beder, Thomas, Aromolaran, Olufemi, Dönitz, Jürgen, Tapanelli, Sofia, Adedeji, Eunice O, Adebiyi, Ezekiel, Bucher, Gregor, Koenig, Rainer
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies. Graphical Abstract Graphical Abstract CLEARER is a machine learning approach for predicting essential genes across eukaryotes. The classifier is trained on multiple species, allowing identification of essential genes in a new species with high accuracy.
ISSN:2631-9268
2631-9268
DOI:10.1093/nargab/lqab110