ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains
Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins of...
Gespeichert in:
Veröffentlicht in: | BMC bioinformatics 2017-02, Vol.18 (1), p.107-107, Article 107 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 107 |
---|---|
container_issue | 1 |
container_start_page | 107 |
container_title | BMC bioinformatics |
container_volume | 18 |
creator | Alborzi, Seyed Ziaeddin Devignes, Marie-Dominique Ritchie, David W |
description | Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.
This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations.
These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ . |
doi_str_mv | 10.1186/s12859-017-1519-x |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5307852</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4317819021</sourcerecordid><originalsourceid>FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</originalsourceid><addsrcrecordid>eNpdkU1PFTEUhhsjEUR_gBvTxI0uRvoxnXZcmJArCsk1utB1048z3JI7LbYzV-DX0-tFAqzanPO8b0_Pi9AbSj5SqrqjQpkSfUOobKigfXP1DB3QVtKGUSKeP7jvo5elXJAKKiJeoH2maM-p6A6QOVl8SaMJ8XuIkD9hH4pLG8ghnuNV8B4iNqUkF8wUUizYwvQXahHizfUI2KVxDKXUFo7zaCEXbKLHPwczYv_Pt7xCe4NZF3h9dx6i319Pfi1Om-WPb2eL42Xj2o5OjeuV4Nwab1nnhWFCemWlaIE6Y0xrPZdMGQ9tD6TviLIdsZwPg2ODqojnh-jzzvdytiN4B3HKZq0vcxhNvtbJBP24E8NKn6eNFpxIJVg1-LAzWD2RnR4v9bZGaNt1qmUbWtn3d4_l9GeGMum6BgfrtYmQ5qJrOor3rRSyou-eoBdpzrGuolKSs75Gt6XojnI5lZJhuJ-AEr0NW-_CrkNIvQ1bX1XN24c_vlf8T5ffAkPDp0M</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1873291517</pqid></control><display><type>article</type><title>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central Open Access</source><source>Springer Nature OA Free Journals</source><source>PubMed Central</source><source>SpringerLink Journals - AutoHoldings</source><creator>Alborzi, Seyed Ziaeddin ; Devignes, Marie-Dominique ; Ritchie, David W</creator><creatorcontrib>Alborzi, Seyed Ziaeddin ; Devignes, Marie-Dominique ; Ritchie, David W</creatorcontrib><description>Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.
This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations.
These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-017-1519-x</identifier><identifier>PMID: 28193156</identifier><language>eng</language><publisher>England: BioMed Central</publisher><subject>Computational Biology - methods ; Data Mining - methods ; Databases, Protein ; Enzymes - chemistry ; Enzymes - genetics ; Enzymes - metabolism ; Life Sciences ; Proteins - chemistry ; Proteins - genetics ; Proteins - metabolism</subject><ispartof>BMC bioinformatics, 2017-02, Vol.18 (1), p.107-107, Article 107</ispartof><rights>Copyright BioMed Central 2017</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><rights>The Author(s) 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</citedby><cites>FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</cites><orcidid>0000-0002-0399-8713 ; 0000-0002-0906-7354</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5307852/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5307852/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28193156$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://inria.hal.science/hal-01466842$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Alborzi, Seyed Ziaeddin</creatorcontrib><creatorcontrib>Devignes, Marie-Dominique</creatorcontrib><creatorcontrib>Ritchie, David W</creatorcontrib><title>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.
This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations.
These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .</description><subject>Computational Biology - methods</subject><subject>Data Mining - methods</subject><subject>Databases, Protein</subject><subject>Enzymes - chemistry</subject><subject>Enzymes - genetics</subject><subject>Enzymes - metabolism</subject><subject>Life Sciences</subject><subject>Proteins - chemistry</subject><subject>Proteins - genetics</subject><subject>Proteins - metabolism</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNpdkU1PFTEUhhsjEUR_gBvTxI0uRvoxnXZcmJArCsk1utB1048z3JI7LbYzV-DX0-tFAqzanPO8b0_Pi9AbSj5SqrqjQpkSfUOobKigfXP1DB3QVtKGUSKeP7jvo5elXJAKKiJeoH2maM-p6A6QOVl8SaMJ8XuIkD9hH4pLG8ghnuNV8B4iNqUkF8wUUizYwvQXahHizfUI2KVxDKXUFo7zaCEXbKLHPwczYv_Pt7xCe4NZF3h9dx6i319Pfi1Om-WPb2eL42Xj2o5OjeuV4Nwab1nnhWFCemWlaIE6Y0xrPZdMGQ9tD6TviLIdsZwPg2ODqojnh-jzzvdytiN4B3HKZq0vcxhNvtbJBP24E8NKn6eNFpxIJVg1-LAzWD2RnR4v9bZGaNt1qmUbWtn3d4_l9GeGMum6BgfrtYmQ5qJrOor3rRSyou-eoBdpzrGuolKSs75Gt6XojnI5lZJhuJ-AEr0NW-_CrkNIvQ1bX1XN24c_vlf8T5ffAkPDp0M</recordid><startdate>20170213</startdate><enddate>20170213</enddate><creator>Alborzi, Seyed Ziaeddin</creator><creator>Devignes, Marie-Dominique</creator><creator>Ritchie, David W</creator><general>BioMed Central</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>1XC</scope><scope>VOOES</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-0399-8713</orcidid><orcidid>https://orcid.org/0000-0002-0906-7354</orcidid></search><sort><creationdate>20170213</creationdate><title>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</title><author>Alborzi, Seyed Ziaeddin ; Devignes, Marie-Dominique ; Ritchie, David W</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c461t-c98533badb26d5a257d8b754e1caaa4bd3728ade49e09608b60b33ffc2f81cad3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computational Biology - methods</topic><topic>Data Mining - methods</topic><topic>Databases, Protein</topic><topic>Enzymes - chemistry</topic><topic>Enzymes - genetics</topic><topic>Enzymes - metabolism</topic><topic>Life Sciences</topic><topic>Proteins - chemistry</topic><topic>Proteins - genetics</topic><topic>Proteins - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alborzi, Seyed Ziaeddin</creatorcontrib><creatorcontrib>Devignes, Marie-Dominique</creatorcontrib><creatorcontrib>Ritchie, David W</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alborzi, Seyed Ziaeddin</au><au>Devignes, Marie-Dominique</au><au>Ritchie, David W</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2017-02-13</date><risdate>2017</risdate><volume>18</volume><issue>1</issue><spage>107</spage><epage>107</epage><pages>107-107</pages><artnum>107</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Many entries in the protein data bank (PDB) are annotated to show their component protein domains according to the Pfam classification, as well as their biological function through the enzyme commission (EC) numbering scheme. However, despite the fact that the biological activity of many proteins often arises from specific domain-domain and domain-ligand interactions, current on-line resources rarely provide a direct mapping from structure to function at the domain level. Since the PDB now contains many tens of thousands of protein chains, and since protein sequence databases can dwarf such numbers by orders of magnitude, there is a pressing need to develop automatic structure-function annotation tools which can operate at the domain level.
This article presents ECDomainMiner, a novel content-based filtering approach to automatically infer associations between EC numbers and Pfam domains. ECDomainMiner finds a total of 20,728 non-redundant EC-Pfam associations with a F-measure of 0.95 with respect to a "Gold Standard" test set extracted from InterPro. Compared to the 1515 manually curated EC-Pfam associations in InterPro, ECDomainMiner infers a 13-fold increase in the number of EC-Pfam associations.
These EC-Pfam associations could be used to annotate some 58,722 protein chains in the PDB which currently lack any EC annotation. The ECDomainMiner database is publicly available at http://ecdm.loria.fr/ .</abstract><cop>England</cop><pub>BioMed Central</pub><pmid>28193156</pmid><doi>10.1186/s12859-017-1519-x</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-0399-8713</orcidid><orcidid>https://orcid.org/0000-0002-0906-7354</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1471-2105 |
ispartof | BMC bioinformatics, 2017-02, Vol.18 (1), p.107-107, Article 107 |
issn | 1471-2105 1471-2105 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5307852 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central Open Access; Springer Nature OA Free Journals; PubMed Central; SpringerLink Journals - AutoHoldings |
subjects | Computational Biology - methods Data Mining - methods Databases, Protein Enzymes - chemistry Enzymes - genetics Enzymes - metabolism Life Sciences Proteins - chemistry Proteins - genetics Proteins - metabolism |
title | ECDomainMiner: discovering hidden associations between enzyme commission numbers and Pfam domains |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T21%3A13%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ECDomainMiner:%20discovering%20hidden%20associations%20between%20enzyme%20commission%20numbers%20and%20Pfam%20domains&rft.jtitle=BMC%20bioinformatics&rft.au=Alborzi,%20Seyed%20Ziaeddin&rft.date=2017-02-13&rft.volume=18&rft.issue=1&rft.spage=107&rft.epage=107&rft.pages=107-107&rft.artnum=107&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-017-1519-x&rft_dat=%3Cproquest_pubme%3E4317819021%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1873291517&rft_id=info:pmid/28193156&rfr_iscdi=true |