Predicting essential genes of 41 prokaryotes by a semi-supervised method
Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result...
Gespeichert in:
Veröffentlicht in: | Analytical biochemistry 2020-11, Vol.609, p.113919-113919, Article 113919 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 113919 |
---|---|
container_issue | |
container_start_page | 113919 |
container_title | Analytical biochemistry |
container_volume | 609 |
creator | Liu, Xiao He, Ting Guo, Zhirui Ren, Meixiang Luo, Yachuan |
description | Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
[Display omitted]
•The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality. |
doi_str_mv | 10.1016/j.ab.2020.113919 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2436397046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0003269720304516</els_id><sourcerecordid>2436397046</sourcerecordid><originalsourceid>FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</originalsourceid><addsrcrecordid>eNp1kDFPwzAQhS0EglLYmVBGlpQ723ESNoSAIlWCAWbLsS_g0iTFTpH67zEqsDGd7vTe072PsTOEGQKqy-XMNDMOPK0oaqz32AShVjkIqPfZBABEzlVdHrHjGJcAiLJQh-xI8IqXUhUTNn8K5Lwdff-aUYzUj96sslfqKWZDm0nM1mF4N2E7jOnSbDOTRep8HjdrCp8-kss6Gt8Gd8IOWrOKdPozp-zl7vb5Zp4vHu8fbq4XuRUFjHlbuKYqJCIJbDk5U5UllVAo4YSsGuRorLSWuCxIqqaqK7CI4FpbV40CIabsYpeb_vrYUBx156Ol1cr0NGyi5lIoUZcgVZLCTmrDEGOgVq-D71IXjaC_-emlNo3-5qd3_JLl_Cd903Tk_gy_wJLgaieg1PHTU9DReuptghjIjtoN_v_0L5hqflM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2436397046</pqid></control><display><type>article</type><title>Predicting essential genes of 41 prokaryotes by a semi-supervised method</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Liu, Xiao ; He, Ting ; Guo, Zhirui ; Ren, Meixiang ; Luo, Yachuan</creator><creatorcontrib>Liu, Xiao ; He, Ting ; Guo, Zhirui ; Ren, Meixiang ; Luo, Yachuan</creatorcontrib><description>Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
[Display omitted]
•The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality.</description><identifier>ISSN: 0003-2697</identifier><identifier>EISSN: 1096-0309</identifier><identifier>DOI: 10.1016/j.ab.2020.113919</identifier><identifier>PMID: 32827465</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Area Under Curve ; Databases, Genetic ; Essential genes ; Genes, Essential - genetics ; Label information ; LGC ; Prokaryotes ; Prokaryotic Cells - metabolism ; ROC Curve ; Semi-supervised ; Supervised Machine Learning</subject><ispartof>Analytical biochemistry, 2020-11, Vol.609, p.113919-113919, Article 113919</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright © 2020 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</citedby><cites>FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</cites><orcidid>0000-0002-7042-3880</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0003269720304516$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32827465$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Xiao</creatorcontrib><creatorcontrib>He, Ting</creatorcontrib><creatorcontrib>Guo, Zhirui</creatorcontrib><creatorcontrib>Ren, Meixiang</creatorcontrib><creatorcontrib>Luo, Yachuan</creatorcontrib><title>Predicting essential genes of 41 prokaryotes by a semi-supervised method</title><title>Analytical biochemistry</title><addtitle>Anal Biochem</addtitle><description>Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
[Display omitted]
•The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality.</description><subject>Area Under Curve</subject><subject>Databases, Genetic</subject><subject>Essential genes</subject><subject>Genes, Essential - genetics</subject><subject>Label information</subject><subject>LGC</subject><subject>Prokaryotes</subject><subject>Prokaryotic Cells - metabolism</subject><subject>ROC Curve</subject><subject>Semi-supervised</subject><subject>Supervised Machine Learning</subject><issn>0003-2697</issn><issn>1096-0309</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp1kDFPwzAQhS0EglLYmVBGlpQ723ESNoSAIlWCAWbLsS_g0iTFTpH67zEqsDGd7vTe072PsTOEGQKqy-XMNDMOPK0oaqz32AShVjkIqPfZBABEzlVdHrHjGJcAiLJQh-xI8IqXUhUTNn8K5Lwdff-aUYzUj96sslfqKWZDm0nM1mF4N2E7jOnSbDOTRep8HjdrCp8-kss6Gt8Gd8IOWrOKdPozp-zl7vb5Zp4vHu8fbq4XuRUFjHlbuKYqJCIJbDk5U5UllVAo4YSsGuRorLSWuCxIqqaqK7CI4FpbV40CIabsYpeb_vrYUBx156Ol1cr0NGyi5lIoUZcgVZLCTmrDEGOgVq-D71IXjaC_-emlNo3-5qd3_JLl_Cd903Tk_gy_wJLgaieg1PHTU9DReuptghjIjtoN_v_0L5hqflM</recordid><startdate>20201115</startdate><enddate>20201115</enddate><creator>Liu, Xiao</creator><creator>He, Ting</creator><creator>Guo, Zhirui</creator><creator>Ren, Meixiang</creator><creator>Luo, Yachuan</creator><general>Elsevier Inc</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7042-3880</orcidid></search><sort><creationdate>20201115</creationdate><title>Predicting essential genes of 41 prokaryotes by a semi-supervised method</title><author>Liu, Xiao ; He, Ting ; Guo, Zhirui ; Ren, Meixiang ; Luo, Yachuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c350t-f5db85411e31f2eda877e70563d348b121ac4cce245e46b8980c110dfc98b6033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Area Under Curve</topic><topic>Databases, Genetic</topic><topic>Essential genes</topic><topic>Genes, Essential - genetics</topic><topic>Label information</topic><topic>LGC</topic><topic>Prokaryotes</topic><topic>Prokaryotic Cells - metabolism</topic><topic>ROC Curve</topic><topic>Semi-supervised</topic><topic>Supervised Machine Learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Xiao</creatorcontrib><creatorcontrib>He, Ting</creatorcontrib><creatorcontrib>Guo, Zhirui</creatorcontrib><creatorcontrib>Ren, Meixiang</creatorcontrib><creatorcontrib>Luo, Yachuan</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Analytical biochemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Xiao</au><au>He, Ting</au><au>Guo, Zhirui</au><au>Ren, Meixiang</au><au>Luo, Yachuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Predicting essential genes of 41 prokaryotes by a semi-supervised method</atitle><jtitle>Analytical biochemistry</jtitle><addtitle>Anal Biochem</addtitle><date>2020-11-15</date><risdate>2020</risdate><volume>609</volume><spage>113919</spage><epage>113919</epage><pages>113919-113919</pages><artnum>113919</artnum><issn>0003-2697</issn><eissn>1096-0309</eissn><abstract>Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
[Display omitted]
•The semi-supervised learning methods were widely used and perform well.•The graph-based semi-supervised learning methods LGC was used to construct the essential genes classifier.•The results illustrate that the ratio of labeled samples affects the performance of the prediction model slightly.•The LGC-based prediction model performs well when there are a few labeled samples.•The results of leave-one-species-out prediction demonstrate that the proposed method has good universality.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>32827465</pmid><doi>10.1016/j.ab.2020.113919</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-7042-3880</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0003-2697 |
ispartof | Analytical biochemistry, 2020-11, Vol.609, p.113919-113919, Article 113919 |
issn | 0003-2697 1096-0309 |
language | eng |
recordid | cdi_proquest_miscellaneous_2436397046 |
source | MEDLINE; Elsevier ScienceDirect Journals |
subjects | Area Under Curve Databases, Genetic Essential genes Genes, Essential - genetics Label information LGC Prokaryotes Prokaryotic Cells - metabolism ROC Curve Semi-supervised Supervised Machine Learning |
title | Predicting essential genes of 41 prokaryotes by a semi-supervised method |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T09%3A59%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Predicting%20essential%20genes%20of%2041%20prokaryotes%20by%20a%20semi-supervised%20method&rft.jtitle=Analytical%20biochemistry&rft.au=Liu,%20Xiao&rft.date=2020-11-15&rft.volume=609&rft.spage=113919&rft.epage=113919&rft.pages=113919-113919&rft.artnum=113919&rft.issn=0003-2697&rft.eissn=1096-0309&rft_id=info:doi/10.1016/j.ab.2020.113919&rft_dat=%3Cproquest_cross%3E2436397046%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2436397046&rft_id=info:pmid/32827465&rft_els_id=S0003269720304516&rfr_iscdi=true |