United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins
Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expen...
Gespeichert in:
Veröffentlicht in: | IEEE/ACM transactions on computational biology and bioinformatics 2020-07, Vol.17 (4), p.1451-1458 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1458 |
---|---|
container_issue | 4 |
container_start_page | 1451 |
container_title | IEEE/ACM transactions on computational biology and bioinformatics |
container_volume | 17 |
creator | Li, Gaoshi Li, Min Wang, Jianxin Li, Yaohang Pan, Yi |
description | Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model. |
doi_str_mv | 10.1109/TCBB.2018.2889978 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2162493186</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8590747</ieee_id><sourcerecordid>2431086850</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</originalsourceid><addsrcrecordid>eNpdkE1r3DAQhkVoab76A0KgGHLJxdsZS7KlY9ekHxCaHhLITVj2eFfBa6WS9rD_vlp2k0NPM4yedxg9jF0hLBBBf31sl8tFBagWlVJaN-qEnaGUTal1LT7seyFLqWt-ys5jfAGohAbxiZ1yyFOpqjP2_DS7REPxm9xqbX1Yez8U7eQjzRRj0dKcQje5tCu6eSgeQlr7ya92xehD8SfQ4Prk5lVxF3MguW7KQ5_IzfGSfRy7KdLnY71gT9_vHtuf5f3Dj1_tt_uy50KnUklr674hMSJqkI3tkTfcag1WEuLYaxIEkgM0duhQjBVBJWsYrdLIleAX7Paw9zX4v1uKyWxc7Gmaupn8NpoK6_xrjqrO6M1_6IvfhjlfZyrBEVStJGQKD1QffIyBRvMa3KYLO4Ng9trNXrvZazdH7Tnz5bh5azc0vCfePGfg-gA4Inp_VlJDIxr-Dwc0hdk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2431086850</pqid></control><display><type>article</type><title>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Gaoshi ; Li, Min ; Wang, Jianxin ; Li, Yaohang ; Pan, Yi</creator><creatorcontrib>Li, Gaoshi ; Li, Min ; Wang, Jianxin ; Li, Yaohang ; Pan, Yi</creatorcontrib><description>Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2018.2889978</identifier><identifier>PMID: 30596582</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Accuracy ; Algorithms ; Computational Biology - methods ; Computer applications ; Databases, Protein ; Drug development ; E coli ; Eigenvectors ; EPOC ; Escherichia coli Proteins - chemistry ; Escherichia coli Proteins - metabolism ; essential proteins ; Experimental methods ; Fuses ; Gene expression ; Information sources ; Integrated circuit modeling ; Models, Biological ; Neighborhood closeness centrality ; Neighborhoods ; orthologous ; Orthology ; PPI network ; Predictions ; Protein interaction ; Protein Interaction Maps ; Proteins ; Random walk ; Research methodology ; Saccharomyces cerevisiae Proteins - chemistry ; Saccharomyces cerevisiae Proteins - metabolism</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2020-07, Vol.17 (4), p.1451-1458</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</citedby><cites>FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</cites><orcidid>0000-0002-2766-3096 ; 0000-0002-0188-1394 ; 0000-0003-0178-1876 ; 0000-0003-1516-0480</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8590747$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8590747$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30596582$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Gaoshi</creatorcontrib><creatorcontrib>Li, Min</creatorcontrib><creatorcontrib>Wang, Jianxin</creatorcontrib><creatorcontrib>Li, Yaohang</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><title>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>Computer applications</subject><subject>Databases, Protein</subject><subject>Drug development</subject><subject>E coli</subject><subject>Eigenvectors</subject><subject>EPOC</subject><subject>Escherichia coli Proteins - chemistry</subject><subject>Escherichia coli Proteins - metabolism</subject><subject>essential proteins</subject><subject>Experimental methods</subject><subject>Fuses</subject><subject>Gene expression</subject><subject>Information sources</subject><subject>Integrated circuit modeling</subject><subject>Models, Biological</subject><subject>Neighborhood closeness centrality</subject><subject>Neighborhoods</subject><subject>orthologous</subject><subject>Orthology</subject><subject>PPI network</subject><subject>Predictions</subject><subject>Protein interaction</subject><subject>Protein Interaction Maps</subject><subject>Proteins</subject><subject>Random walk</subject><subject>Research methodology</subject><subject>Saccharomyces cerevisiae Proteins - chemistry</subject><subject>Saccharomyces cerevisiae Proteins - metabolism</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkE1r3DAQhkVoab76A0KgGHLJxdsZS7KlY9ekHxCaHhLITVj2eFfBa6WS9rD_vlp2k0NPM4yedxg9jF0hLBBBf31sl8tFBagWlVJaN-qEnaGUTal1LT7seyFLqWt-ys5jfAGohAbxiZ1yyFOpqjP2_DS7REPxm9xqbX1Yez8U7eQjzRRj0dKcQje5tCu6eSgeQlr7ya92xehD8SfQ4Prk5lVxF3MguW7KQ5_IzfGSfRy7KdLnY71gT9_vHtuf5f3Dj1_tt_uy50KnUklr674hMSJqkI3tkTfcag1WEuLYaxIEkgM0duhQjBVBJWsYrdLIleAX7Paw9zX4v1uKyWxc7Gmaupn8NpoK6_xrjqrO6M1_6IvfhjlfZyrBEVStJGQKD1QffIyBRvMa3KYLO4Ng9trNXrvZazdH7Tnz5bh5azc0vCfePGfg-gA4Inp_VlJDIxr-Dwc0hdk</recordid><startdate>202007</startdate><enddate>202007</enddate><creator>Li, Gaoshi</creator><creator>Li, Min</creator><creator>Wang, Jianxin</creator><creator>Li, Yaohang</creator><creator>Pan, Yi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-2766-3096</orcidid><orcidid>https://orcid.org/0000-0002-0188-1394</orcidid><orcidid>https://orcid.org/0000-0003-0178-1876</orcidid><orcidid>https://orcid.org/0000-0003-1516-0480</orcidid></search><sort><creationdate>202007</creationdate><title>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</title><author>Li, Gaoshi ; Li, Min ; Wang, Jianxin ; Li, Yaohang ; Pan, Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>Computer applications</topic><topic>Databases, Protein</topic><topic>Drug development</topic><topic>E coli</topic><topic>Eigenvectors</topic><topic>EPOC</topic><topic>Escherichia coli Proteins - chemistry</topic><topic>Escherichia coli Proteins - metabolism</topic><topic>essential proteins</topic><topic>Experimental methods</topic><topic>Fuses</topic><topic>Gene expression</topic><topic>Information sources</topic><topic>Integrated circuit modeling</topic><topic>Models, Biological</topic><topic>Neighborhood closeness centrality</topic><topic>Neighborhoods</topic><topic>orthologous</topic><topic>Orthology</topic><topic>PPI network</topic><topic>Predictions</topic><topic>Protein interaction</topic><topic>Protein Interaction Maps</topic><topic>Proteins</topic><topic>Random walk</topic><topic>Research methodology</topic><topic>Saccharomyces cerevisiae Proteins - chemistry</topic><topic>Saccharomyces cerevisiae Proteins - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Gaoshi</creatorcontrib><creatorcontrib>Li, Min</creatorcontrib><creatorcontrib>Wang, Jianxin</creatorcontrib><creatorcontrib>Li, Yaohang</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Gaoshi</au><au>Li, Min</au><au>Wang, Jianxin</au><au>Li, Yaohang</au><au>Pan, Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2020-07</date><risdate>2020</risdate><volume>17</volume><issue>4</issue><spage>1451</spage><epage>1458</epage><pages>1451-1458</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30596582</pmid><doi>10.1109/TCBB.2018.2889978</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-2766-3096</orcidid><orcidid>https://orcid.org/0000-0002-0188-1394</orcidid><orcidid>https://orcid.org/0000-0003-0178-1876</orcidid><orcidid>https://orcid.org/0000-0003-1516-0480</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1545-5963 |
ispartof | IEEE/ACM transactions on computational biology and bioinformatics, 2020-07, Vol.17 (4), p.1451-1458 |
issn | 1545-5963 1557-9964 |
language | eng |
recordid | cdi_proquest_miscellaneous_2162493186 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy Algorithms Computational Biology - methods Computer applications Databases, Protein Drug development E coli Eigenvectors EPOC Escherichia coli Proteins - chemistry Escherichia coli Proteins - metabolism essential proteins Experimental methods Fuses Gene expression Information sources Integrated circuit modeling Models, Biological Neighborhood closeness centrality Neighborhoods orthologous Orthology PPI network Predictions Protein interaction Protein Interaction Maps Proteins Random walk Research methodology Saccharomyces cerevisiae Proteins - chemistry Saccharomyces cerevisiae Proteins - metabolism |
title | United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T15%3A40%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=United%20Neighborhood%20Closeness%20Centrality%20and%20Orthology%20for%20Predicting%20Essential%20Proteins&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Li,%20Gaoshi&rft.date=2020-07&rft.volume=17&rft.issue=4&rft.spage=1451&rft.epage=1458&rft.pages=1451-1458&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2018.2889978&rft_dat=%3Cproquest_RIE%3E2431086850%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2431086850&rft_id=info:pmid/30596582&rft_ieee_id=8590747&rfr_iscdi=true |