Predicting essential genes of 41 prokaryotes by a semi-supervised method

Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Analytical biochemistry 2020-11, Vol.609, p.113919-113919, Article 113919
Hauptverfasser: Liu, Xiao, He, Ting, Guo, Zhirui, Ren, Meixiang, Luo, Yachuan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 113919
container_issue
container_start_page 113919
container_title Analytical biochemistry
container_volume 609
creator Liu, Xiao
He, Ting
Guo, Zhirui
Ren, Meixiang
Luo, Yachuan
description Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality. [Display omitted] •The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality.
doi_str_mv 10.1016/j.ab.2020.113919
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2436397046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0003269720304516</els_id><sourcerecordid>2436397046</sourcerecordid><originalsourceid>FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</originalsourceid><addsrcrecordid>eNp1kDFPwzAQhS0EglLYmVBGlpQ723ESNoSAIlWCAWbLsS_g0iTFTpH67zEqsDGd7vTe072PsTOEGQKqy-XMNDMOPK0oaqz32AShVjkIqPfZBABEzlVdHrHjGJcAiLJQh-xI8IqXUhUTNn8K5Lwdff-aUYzUj96sslfqKWZDm0nM1mF4N2E7jOnSbDOTRep8HjdrCp8-kss6Gt8Gd8IOWrOKdPozp-zl7vb5Zp4vHu8fbq4XuRUFjHlbuKYqJCIJbDk5U5UllVAo4YSsGuRorLSWuCxIqqaqK7CI4FpbV40CIabsYpeb_vrYUBx156Ol1cr0NGyi5lIoUZcgVZLCTmrDEGOgVq-D71IXjaC_-emlNo3-5qd3_JLl_Cd903Tk_gy_wJLgaieg1PHTU9DReuptghjIjtoN_v_0L5hqflM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2436397046</pqid></control><display><type>article</type><title>Predicting essential genes of 41 prokaryotes by a semi-supervised method</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Liu, Xiao ; He, Ting ; Guo, Zhirui ; Ren, Meixiang ; Luo, Yachuan</creator><creatorcontrib>Liu, Xiao ; He, Ting ; Guo, Zhirui ; Ren, Meixiang ; Luo, Yachuan</creatorcontrib><description>Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality. [Display omitted] •The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality.</description><identifier>ISSN: 0003-2697</identifier><identifier>EISSN: 1096-0309</identifier><identifier>DOI: 10.1016/j.ab.2020.113919</identifier><identifier>PMID: 32827465</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Area Under Curve ; Databases, Genetic ; Essential genes ; Genes, Essential - genetics ; Label information ; LGC ; Prokaryotes ; Prokaryotic Cells - metabolism ; ROC Curve ; Semi-supervised ; Supervised Machine Learning</subject><ispartof>Analytical biochemistry, 2020-11, Vol.609, p.113919-113919, Article 113919</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright © 2020 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</citedby><cites>FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</cites><orcidid>0000-0002-7042-3880</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0003269720304516$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32827465$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Xiao</creatorcontrib><creatorcontrib>He, Ting</creatorcontrib><creatorcontrib>Guo, Zhirui</creatorcontrib><creatorcontrib>Ren, Meixiang</creatorcontrib><creatorcontrib>Luo, Yachuan</creatorcontrib><title>Predicting essential genes of 41 prokaryotes by a semi-supervised method</title><title>Analytical biochemistry</title><addtitle>Anal Biochem</addtitle><description>Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality. [Display omitted] •The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality.</description><subject>Area Under Curve</subject><subject>Databases, Genetic</subject><subject>Essential genes</subject><subject>Genes, Essential - genetics</subject><subject>Label information</subject><subject>LGC</subject><subject>Prokaryotes</subject><subject>Prokaryotic Cells - metabolism</subject><subject>ROC Curve</subject><subject>Semi-supervised</subject><subject>Supervised Machine Learning</subject><issn>0003-2697</issn><issn>1096-0309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp1kDFPwzAQhS0EglLYmVBGlpQ723ESNoSAIlWCAWbLsS_g0iTFTpH67zEqsDGd7vTe072PsTOEGQKqy-XMNDMOPK0oaqz32AShVjkIqPfZBABEzlVdHrHjGJcAiLJQh-xI8IqXUhUTNn8K5Lwdff-aUYzUj96sslfqKWZDm0nM1mF4N2E7jOnSbDOTRep8HjdrCp8-kss6Gt8Gd8IOWrOKdPozp-zl7vb5Zp4vHu8fbq4XuRUFjHlbuKYqJCIJbDk5U5UllVAo4YSsGuRorLSWuCxIqqaqK7CI4FpbV40CIabsYpeb_vrYUBx156Ol1cr0NGyi5lIoUZcgVZLCTmrDEGOgVq-D71IXjaC_-emlNo3-5qd3_JLl_Cd903Tk_gy_wJLgaieg1PHTU9DReuptghjIjtoN_v_0L5hqflM</recordid><startdate>20201115</startdate><enddate>20201115</enddate><creator>Liu, Xiao</creator><creator>He, Ting</creator><creator>Guo, Zhirui</creator><creator>Ren, Meixiang</creator><creator>Luo, Yachuan</creator><general>Elsevier Inc</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7042-3880</orcidid></search><sort><creationdate>20201115</creationdate><title>Predicting essential genes of 41 prokaryotes by a semi-supervised method</title><author>Liu, Xiao ; He, Ting ; Guo, Zhirui ; Ren, Meixiang ; Luo, Yachuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Area Under Curve</topic><topic>Databases, Genetic</topic><topic>Essential genes</topic><topic>Genes, Essential - genetics</topic><topic>Label information</topic><topic>LGC</topic><topic>Prokaryotes</topic><topic>Prokaryotic Cells - metabolism</topic><topic>ROC Curve</topic><topic>Semi-supervised</topic><topic>Supervised Machine Learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Xiao</creatorcontrib><creatorcontrib>He, Ting</creatorcontrib><creatorcontrib>Guo, Zhirui</creatorcontrib><creatorcontrib>Ren, Meixiang</creatorcontrib><creatorcontrib>Luo, Yachuan</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Analytical biochemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Xiao</au><au>He, Ting</au><au>Guo, Zhirui</au><au>Ren, Meixiang</au><au>Luo, Yachuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting essential genes of 41 prokaryotes by a semi-supervised method</atitle><jtitle>Analytical biochemistry</jtitle><addtitle>Anal Biochem</addtitle><date>2020-11-15</date><risdate>2020</risdate><volume>609</volume><spage>113919</spage><epage>113919</epage><pages>113919-113919</pages><artnum>113919</artnum><issn>0003-2697</issn><eissn>1096-0309</eissn><abstract>Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality. [Display omitted] •The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>32827465</pmid><doi>10.1016/j.ab.2020.113919</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-7042-3880</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0003-2697
ispartof Analytical biochemistry, 2020-11, Vol.609, p.113919-113919, Article 113919
issn 0003-2697
1096-0309
language eng
recordid cdi_proquest_miscellaneous_2436397046
source MEDLINE; Elsevier ScienceDirect Journals
subjects Area Under Curve
Databases, Genetic
Essential genes
Genes, Essential - genetics
Label information
LGC
Prokaryotes
Prokaryotic Cells - metabolism
ROC Curve
Semi-supervised
Supervised Machine Learning
title Predicting essential genes of 41 prokaryotes by a semi-supervised method
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T09%3A59%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20essential%20genes%20of%2041%20prokaryotes%20by%20a%20semi-supervised%20method&rft.jtitle=Analytical%20biochemistry&rft.au=Liu,%20Xiao&rft.date=2020-11-15&rft.volume=609&rft.spage=113919&rft.epage=113919&rft.pages=113919-113919&rft.artnum=113919&rft.issn=0003-2697&rft.eissn=1096-0309&rft_id=info:doi/10.1016/j.ab.2020.113919&rft_dat=%3Cproquest_cross%3E2436397046%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2436397046&rft_id=info:pmid/32827465&rft_els_id=S0003269720304516&rfr_iscdi=true