ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains

Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC bioinformatics 2017-02, Vol.18 (1), p.107-107, Article 107
Hauptverfasser: Alborzi, Seyed Ziaeddin, Devignes, Marie-Dominique, Ritchie, David W
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 107
container_issue 1
container_start_page 107
container_title BMC bioinformatics
container_volume 18
creator Alborzi, Seyed Ziaeddin
Devignes, Marie-Dominique
Ritchie, David W
description Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level. This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations. These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .
doi_str_mv 10.1186/s12859-017-1519-x
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5307852</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4317819021</sourcerecordid><originalsourceid>FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</originalsourceid><addsrcrecordid>eNpdkU1PFTEUhhsjEUR_gBvTxI0uRvoxnXZcmJArCsk1utB1048z3JI7LbYzV-DX0-tFAqzanPO8b0_Pi9AbSj5SqrqjQpkSfUOobKigfXP1DB3QVtKGUSKeP7jvo5elXJAKKiJeoH2maM-p6A6QOVl8SaMJ8XuIkD9hH4pLG8ghnuNV8B4iNqUkF8wUUizYwvQXahHizfUI2KVxDKXUFo7zaCEXbKLHPwczYv_Pt7xCe4NZF3h9dx6i319Pfi1Om-WPb2eL42Xj2o5OjeuV4Nwab1nnhWFCemWlaIE6Y0xrPZdMGQ9tD6TviLIdsZwPg2ODqojnh-jzzvdytiN4B3HKZq0vcxhNvtbJBP24E8NKn6eNFpxIJVg1-LAzWD2RnR4v9bZGaNt1qmUbWtn3d4_l9GeGMum6BgfrtYmQ5qJrOor3rRSyou-eoBdpzrGuolKSs75Gt6XojnI5lZJhuJ-AEr0NW-_CrkNIvQ1bX1XN24c_vlf8T5ffAkPDp0M</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1873291517</pqid></control><display><type>article</type><title>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>PubMed Central</source><source>SpringerLink Journals - AutoHoldings</source><creator>Alborzi, Seyed Ziaeddin ; Devignes, Marie-Dominique ; Ritchie, David W</creator><creatorcontrib>Alborzi, Seyed Ziaeddin ; Devignes, Marie-Dominique ; Ritchie, David W</creatorcontrib><description>Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level. This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations. These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-017-1519-x</identifier><identifier>PMID: 28193156</identifier><language>eng</language><publisher>England: BioMed Central</publisher><subject>Computational Biology - methods ; Data Mining - methods ; Databases, Protein ; Enzymes - chemistry ; Enzymes - genetics ; Enzymes - metabolism ; Life Sciences ; Proteins - chemistry ; Proteins - genetics ; Proteins - metabolism</subject><ispartof>BMC bioinformatics, 2017-02, Vol.18 (1), p.107-107, Article 107</ispartof><rights>Copyright BioMed Central 2017</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><rights>The Author(s) 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</citedby><cites>FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</cites><orcidid>0000-0002-0399-8713 ; 0000-0002-0906-7354</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5307852/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5307852/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28193156$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://inria.hal.science/hal-01466842$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Alborzi, Seyed Ziaeddin</creatorcontrib><creatorcontrib>Devignes, Marie-Dominique</creatorcontrib><creatorcontrib>Ritchie, David W</creatorcontrib><title>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level. This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations. These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .</description><subject>Computational Biology - methods</subject><subject>Data Mining - methods</subject><subject>Databases, Protein</subject><subject>Enzymes - chemistry</subject><subject>Enzymes - genetics</subject><subject>Enzymes - metabolism</subject><subject>Life Sciences</subject><subject>Proteins - chemistry</subject><subject>Proteins - genetics</subject><subject>Proteins - metabolism</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNpdkU1PFTEUhhsjEUR_gBvTxI0uRvoxnXZcmJArCsk1utB1048z3JI7LbYzV-DX0-tFAqzanPO8b0_Pi9AbSj5SqrqjQpkSfUOobKigfXP1DB3QVtKGUSKeP7jvo5elXJAKKiJeoH2maM-p6A6QOVl8SaMJ8XuIkD9hH4pLG8ghnuNV8B4iNqUkF8wUUizYwvQXahHizfUI2KVxDKXUFo7zaCEXbKLHPwczYv_Pt7xCe4NZF3h9dx6i319Pfi1Om-WPb2eL42Xj2o5OjeuV4Nwab1nnhWFCemWlaIE6Y0xrPZdMGQ9tD6TviLIdsZwPg2ODqojnh-jzzvdytiN4B3HKZq0vcxhNvtbJBP24E8NKn6eNFpxIJVg1-LAzWD2RnR4v9bZGaNt1qmUbWtn3d4_l9GeGMum6BgfrtYmQ5qJrOor3rRSyou-eoBdpzrGuolKSs75Gt6XojnI5lZJhuJ-AEr0NW-_CrkNIvQ1bX1XN24c_vlf8T5ffAkPDp0M</recordid><startdate>20170213</startdate><enddate>20170213</enddate><creator>Alborzi, Seyed Ziaeddin</creator><creator>Devignes, Marie-Dominique</creator><creator>Ritchie, David W</creator><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>1XC</scope><scope>VOOES</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-0399-8713</orcidid><orcidid>https://orcid.org/0000-0002-0906-7354</orcidid></search><sort><creationdate>20170213</creationdate><title>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</title><author>Alborzi, Seyed Ziaeddin ; Devignes, Marie-Dominique ; Ritchie, David W</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computational Biology - methods</topic><topic>Data Mining - methods</topic><topic>Databases, Protein</topic><topic>Enzymes - chemistry</topic><topic>Enzymes - genetics</topic><topic>Enzymes - metabolism</topic><topic>Life Sciences</topic><topic>Proteins - chemistry</topic><topic>Proteins - genetics</topic><topic>Proteins - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alborzi, Seyed Ziaeddin</creatorcontrib><creatorcontrib>Devignes, Marie-Dominique</creatorcontrib><creatorcontrib>Ritchie, David W</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alborzi, Seyed Ziaeddin</au><au>Devignes, Marie-Dominique</au><au>Ritchie, David W</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2017-02-13</date><risdate>2017</risdate><volume>18</volume><issue>1</issue><spage>107</spage><epage>107</epage><pages>107-107</pages><artnum>107</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level. This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations. These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .</abstract><cop>England</cop><pub>BioMed Central</pub><pmid>28193156</pmid><doi>10.1186/s12859-017-1519-x</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-0399-8713</orcidid><orcidid>https://orcid.org/0000-0002-0906-7354</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2017-02, Vol.18 (1), p.107-107, Article 107
issn 1471-2105
1471-2105
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5307852
source MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; Springer Nature OA Free Journals; PubMed Central; SpringerLink Journals - AutoHoldings
subjects Computational Biology - methods
Data Mining - methods
Databases, Protein
Enzymes - chemistry
Enzymes - genetics
Enzymes - metabolism
Life Sciences
Proteins - chemistry
Proteins - genetics
Proteins - metabolism
title ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T21%3A13%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ECDomainMiner:%20discovering%20hidden%20associations%20between%20enzyme%20commission%20numbers%20and%20Pfam%20domains&rft.jtitle=BMC%20bioinformatics&rft.au=Alborzi,%20Seyed%20Ziaeddin&rft.date=2017-02-13&rft.volume=18&rft.issue=1&rft.spage=107&rft.epage=107&rft.pages=107-107&rft.artnum=107&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-017-1519-x&rft_dat=%3Cproquest_pubme%3E4317819021%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1873291517&rft_id=info:pmid/28193156&rfr_iscdi=true