United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins

Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2020-07, Vol.17 (4), p.1451-1458
Hauptverfasser: Li, Gaoshi, Li, Min, Wang, Jianxin, Li, Yaohang, Pan, Yi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1458
container_issue 4
container_start_page 1451
container_title IEEE/ACM transactions on computational biology and bioinformatics
container_volume 17
creator Li, Gaoshi
Li, Min
Wang, Jianxin
Li, Yaohang
Pan, Yi
description Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.
doi_str_mv 10.1109/TCBB.2018.2889978
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2162493186</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8590747</ieee_id><sourcerecordid>2431086850</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</originalsourceid><addsrcrecordid>eNpdkE1r3DAQhkVoab76A0KgGHLJxdsZS7KlY9ekHxCaHhLITVj2eFfBa6WS9rD_vlp2k0NPM4yedxg9jF0hLBBBf31sl8tFBagWlVJaN-qEnaGUTal1LT7seyFLqWt-ys5jfAGohAbxiZ1yyFOpqjP2_DS7REPxm9xqbX1Yez8U7eQjzRRj0dKcQje5tCu6eSgeQlr7ya92xehD8SfQ4Prk5lVxF3MguW7KQ5_IzfGSfRy7KdLnY71gT9_vHtuf5f3Dj1_tt_uy50KnUklr674hMSJqkI3tkTfcag1WEuLYaxIEkgM0duhQjBVBJWsYrdLIleAX7Paw9zX4v1uKyWxc7Gmaupn8NpoK6_xrjqrO6M1_6IvfhjlfZyrBEVStJGQKD1QffIyBRvMa3KYLO4Ng9trNXrvZazdH7Tnz5bh5azc0vCfePGfg-gA4Inp_VlJDIxr-Dwc0hdk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2431086850</pqid></control><display><type>article</type><title>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Gaoshi ; Li, Min ; Wang, Jianxin ; Li, Yaohang ; Pan, Yi</creator><creatorcontrib>Li, Gaoshi ; Li, Min ; Wang, Jianxin ; Li, Yaohang ; Pan, Yi</creatorcontrib><description>Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2018.2889978</identifier><identifier>PMID: 30596582</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Accuracy ; Algorithms ; Computational Biology - methods ; Computer applications ; Databases, Protein ; Drug development ; E coli ; Eigenvectors ; EPOC ; Escherichia coli Proteins - chemistry ; Escherichia coli Proteins - metabolism ; essential proteins ; Experimental methods ; Fuses ; Gene expression ; Information sources ; Integrated circuit modeling ; Models, Biological ; Neighborhood closeness centrality ; Neighborhoods ; orthologous ; Orthology ; PPI network ; Predictions ; Protein interaction ; Protein Interaction Maps ; Proteins ; Random walk ; Research methodology ; Saccharomyces cerevisiae Proteins - chemistry ; Saccharomyces cerevisiae Proteins - metabolism</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2020-07, Vol.17 (4), p.1451-1458</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</citedby><cites>FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</cites><orcidid>0000-0002-2766-3096 ; 0000-0002-0188-1394 ; 0000-0003-0178-1876 ; 0000-0003-1516-0480</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8590747$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8590747$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30596582$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Gaoshi</creatorcontrib><creatorcontrib>Li, Min</creatorcontrib><creatorcontrib>Wang, Jianxin</creatorcontrib><creatorcontrib>Li, Yaohang</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><title>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Computational Biology - methods</subject><subject>Computer applications</subject><subject>Databases, Protein</subject><subject>Drug development</subject><subject>E coli</subject><subject>Eigenvectors</subject><subject>EPOC</subject><subject>Escherichia coli Proteins - chemistry</subject><subject>Escherichia coli Proteins - metabolism</subject><subject>essential proteins</subject><subject>Experimental methods</subject><subject>Fuses</subject><subject>Gene expression</subject><subject>Information sources</subject><subject>Integrated circuit modeling</subject><subject>Models, Biological</subject><subject>Neighborhood closeness centrality</subject><subject>Neighborhoods</subject><subject>orthologous</subject><subject>Orthology</subject><subject>PPI network</subject><subject>Predictions</subject><subject>Protein interaction</subject><subject>Protein Interaction Maps</subject><subject>Proteins</subject><subject>Random walk</subject><subject>Research methodology</subject><subject>Saccharomyces cerevisiae Proteins - chemistry</subject><subject>Saccharomyces cerevisiae Proteins - metabolism</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkE1r3DAQhkVoab76A0KgGHLJxdsZS7KlY9ekHxCaHhLITVj2eFfBa6WS9rD_vlp2k0NPM4yedxg9jF0hLBBBf31sl8tFBagWlVJaN-qEnaGUTal1LT7seyFLqWt-ys5jfAGohAbxiZ1yyFOpqjP2_DS7REPxm9xqbX1Yez8U7eQjzRRj0dKcQje5tCu6eSgeQlr7ya92xehD8SfQ4Prk5lVxF3MguW7KQ5_IzfGSfRy7KdLnY71gT9_vHtuf5f3Dj1_tt_uy50KnUklr674hMSJqkI3tkTfcag1WEuLYaxIEkgM0duhQjBVBJWsYrdLIleAX7Paw9zX4v1uKyWxc7Gmaupn8NpoK6_xrjqrO6M1_6IvfhjlfZyrBEVStJGQKD1QffIyBRvMa3KYLO4Ng9trNXrvZazdH7Tnz5bh5azc0vCfePGfg-gA4Inp_VlJDIxr-Dwc0hdk</recordid><startdate>202007</startdate><enddate>202007</enddate><creator>Li, Gaoshi</creator><creator>Li, Min</creator><creator>Wang, Jianxin</creator><creator>Li, Yaohang</creator><creator>Pan, Yi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-2766-3096</orcidid><orcidid>https://orcid.org/0000-0002-0188-1394</orcidid><orcidid>https://orcid.org/0000-0003-0178-1876</orcidid><orcidid>https://orcid.org/0000-0003-1516-0480</orcidid></search><sort><creationdate>202007</creationdate><title>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</title><author>Li, Gaoshi ; Li, Min ; Wang, Jianxin ; Li, Yaohang ; Pan, Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-85bb6c7e4f119057bc1373b990b5e11fc9e4e053007bda14f2e02560fb8913843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Computational Biology - methods</topic><topic>Computer applications</topic><topic>Databases, Protein</topic><topic>Drug development</topic><topic>E coli</topic><topic>Eigenvectors</topic><topic>EPOC</topic><topic>Escherichia coli Proteins - chemistry</topic><topic>Escherichia coli Proteins - metabolism</topic><topic>essential proteins</topic><topic>Experimental methods</topic><topic>Fuses</topic><topic>Gene expression</topic><topic>Information sources</topic><topic>Integrated circuit modeling</topic><topic>Models, Biological</topic><topic>Neighborhood closeness centrality</topic><topic>Neighborhoods</topic><topic>orthologous</topic><topic>Orthology</topic><topic>PPI network</topic><topic>Predictions</topic><topic>Protein interaction</topic><topic>Protein Interaction Maps</topic><topic>Proteins</topic><topic>Random walk</topic><topic>Research methodology</topic><topic>Saccharomyces cerevisiae Proteins - chemistry</topic><topic>Saccharomyces cerevisiae Proteins - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Gaoshi</creatorcontrib><creatorcontrib>Li, Min</creatorcontrib><creatorcontrib>Wang, Jianxin</creatorcontrib><creatorcontrib>Li, Yaohang</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Gaoshi</au><au>Li, Min</au><au>Wang, Jianxin</au><au>Li, Yaohang</au><au>Pan, Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2020-07</date><risdate>2020</risdate><volume>17</volume><issue>4</issue><spage>1451</spage><epage>1458</epage><pages>1451-1458</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30596582</pmid><doi>10.1109/TCBB.2018.2889978</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-2766-3096</orcidid><orcidid>https://orcid.org/0000-0002-0188-1394</orcidid><orcidid>https://orcid.org/0000-0003-0178-1876</orcidid><orcidid>https://orcid.org/0000-0003-1516-0480</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-5963
ispartof IEEE/ACM transactions on computational biology and bioinformatics, 2020-07, Vol.17 (4), p.1451-1458
issn 1545-5963
1557-9964
language eng
recordid cdi_proquest_miscellaneous_2162493186
source IEEE Electronic Library (IEL)
subjects Accuracy
Algorithms
Computational Biology - methods
Computer applications
Databases, Protein
Drug development
E coli
Eigenvectors
EPOC
Escherichia coli Proteins - chemistry
Escherichia coli Proteins - metabolism
essential proteins
Experimental methods
Fuses
Gene expression
Information sources
Integrated circuit modeling
Models, Biological
Neighborhood closeness centrality
Neighborhoods
orthologous
Orthology
PPI network
Predictions
Protein interaction
Protein Interaction Maps
Proteins
Random walk
Research methodology
Saccharomyces cerevisiae Proteins - chemistry
Saccharomyces cerevisiae Proteins - metabolism
title United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T15%3A40%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=United%20Neighborhood%20Closeness%20Centrality%20and%20Orthology%20for%20Predicting%20Essential%20Proteins&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Li,%20Gaoshi&rft.date=2020-07&rft.volume=17&rft.issue=4&rft.spage=1451&rft.epage=1458&rft.pages=1451-1458&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2018.2889978&rft_dat=%3Cproquest_RIE%3E2431086850%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2431086850&rft_id=info:pmid/30596582&rft_ieee_id=8590747&rfr_iscdi=true