A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM
. [Display omitted] ▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Resul...
Gespeichert in:
Veröffentlicht in: | Computational biology and chemistry 2011-02, Vol.35 (1), p.1-9 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 9 |
---|---|
container_issue | 1 |
container_start_page | 1 |
container_title | Computational biology and chemistry |
container_volume | 35 |
creator | Kavousi, Kaveh Moshiri, Behzad Sadeghi, Mehdi Araabi, Babak N. Moosavi-Movahedi, Ali Akbar |
description | .
[Display omitted]
▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination.
Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins. |
doi_str_mv | 10.1016/j.compbiolchem.2010.12.001 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_869832210</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1476927110001106</els_id><sourcerecordid>855202910</sourcerecordid><originalsourceid>FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</originalsourceid><addsrcrecordid>eNqNkUtLAzEUhYMoVqt_QYIbV61JppOZcSe-QVGogruQx42mzExqMlPovzeltbjTTR7ku_fknoPQKSVjSig_n421b-bK-Vp_QjNmZPXAxoTQHXRAJwUfVax8392eCzpAhzHOCGEZIfk-GjDKKOcFO0DmEs-D78C12PraYF3LGJ11ENI9NGCwWmLbR9d-YOOshQBthxtvIGJv8TxCbzyWjWvTql1qkL7mo-ucb_HCSfwynT4doT0r6wjHm32I3m5vXq_uR4_Pdw9Xl48jPaG0GxWaEC2t4UwCV0XJTCkrVlHIpWY0p3mmraqIqiRXKlekzFRZ6Ak3nKch-SQborN13zTSVw-xE42LGupatuD7KEpelRljlPxN5jkjSXpFXqxJHXyMAayYB9fIsBSUiFUcYiZ-xyFWcQjKRIojFZ9sZHqVvNyW_vifgOs1AMmWRXJdRO2g1WBcAN0J491_dL4BZWmi1g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>855202910</pqid></control><display><type>article</type><title>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Kavousi, Kaveh ; Moshiri, Behzad ; Sadeghi, Mehdi ; Araabi, Babak N. ; Moosavi-Movahedi, Ali Akbar</creator><creatorcontrib>Kavousi, Kaveh ; Moshiri, Behzad ; Sadeghi, Mehdi ; Araabi, Babak N. ; Moosavi-Movahedi, Ali Akbar</creatorcontrib><description>.
[Display omitted]
▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination.
Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.</description><identifier>ISSN: 1476-9271</identifier><identifier>EISSN: 1476-928X</identifier><identifier>DOI: 10.1016/j.compbiolchem.2010.12.001</identifier><identifier>PMID: 21216672</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Amino Acid Sequence ; Biology ; Classifiers ; Combined classifier ; Computer Simulation ; Fuzzy ; Fuzzy set theory ; Information content ; Mathematical analysis ; Molecular Sequence Data ; Operators ; Position specific scoring matrix ; Position-Specific Scoring Matrices ; Protein fold classification ; Protein Folding ; Protein Structure, Tertiary ; Proteins ; Proteins - chemistry ; Proteins - classification ; Proteins - genetics ; Sequence based feature ; Vectors (mathematics)</subject><ispartof>Computational biology and chemistry, 2011-02, Vol.35 (1), p.1-9</ispartof><rights>2010 Elsevier Ltd</rights><rights>Copyright © 2010 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</citedby><cites>FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1476927110001106$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3536,27903,27904,65309</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21216672$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kavousi, Kaveh</creatorcontrib><creatorcontrib>Moshiri, Behzad</creatorcontrib><creatorcontrib>Sadeghi, Mehdi</creatorcontrib><creatorcontrib>Araabi, Babak N.</creatorcontrib><creatorcontrib>Moosavi-Movahedi, Ali Akbar</creatorcontrib><title>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</title><title>Computational biology and chemistry</title><addtitle>Comput Biol Chem</addtitle><description>.
[Display omitted]
▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination.
Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.</description><subject>Amino Acid Sequence</subject><subject>Biology</subject><subject>Classifiers</subject><subject>Combined classifier</subject><subject>Computer Simulation</subject><subject>Fuzzy</subject><subject>Fuzzy set theory</subject><subject>Information content</subject><subject>Mathematical analysis</subject><subject>Molecular Sequence Data</subject><subject>Operators</subject><subject>Position specific scoring matrix</subject><subject>Position-Specific Scoring Matrices</subject><subject>Protein fold classification</subject><subject>Protein Folding</subject><subject>Protein Structure, Tertiary</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Proteins - classification</subject><subject>Proteins - genetics</subject><subject>Sequence based feature</subject><subject>Vectors (mathematics)</subject><issn>1476-9271</issn><issn>1476-928X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkUtLAzEUhYMoVqt_QYIbV61JppOZcSe-QVGogruQx42mzExqMlPovzeltbjTTR7ku_fknoPQKSVjSig_n421b-bK-Vp_QjNmZPXAxoTQHXRAJwUfVax8392eCzpAhzHOCGEZIfk-GjDKKOcFO0DmEs-D78C12PraYF3LGJ11ENI9NGCwWmLbR9d-YOOshQBthxtvIGJv8TxCbzyWjWvTql1qkL7mo-ucb_HCSfwynT4doT0r6wjHm32I3m5vXq_uR4_Pdw9Xl48jPaG0GxWaEC2t4UwCV0XJTCkrVlHIpWY0p3mmraqIqiRXKlekzFRZ6Ak3nKch-SQborN13zTSVw-xE42LGupatuD7KEpelRljlPxN5jkjSXpFXqxJHXyMAayYB9fIsBSUiFUcYiZ-xyFWcQjKRIojFZ9sZHqVvNyW_vifgOs1AMmWRXJdRO2g1WBcAN0J491_dL4BZWmi1g</recordid><startdate>20110201</startdate><enddate>20110201</enddate><creator>Kavousi, Kaveh</creator><creator>Moshiri, Behzad</creator><creator>Sadeghi, Mehdi</creator><creator>Araabi, Babak N.</creator><creator>Moosavi-Movahedi, Ali Akbar</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20110201</creationdate><title>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</title><author>Kavousi, Kaveh ; Moshiri, Behzad ; Sadeghi, Mehdi ; Araabi, Babak N. ; Moosavi-Movahedi, Ali Akbar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Amino Acid Sequence</topic><topic>Biology</topic><topic>Classifiers</topic><topic>Combined classifier</topic><topic>Computer Simulation</topic><topic>Fuzzy</topic><topic>Fuzzy set theory</topic><topic>Information content</topic><topic>Mathematical analysis</topic><topic>Molecular Sequence Data</topic><topic>Operators</topic><topic>Position specific scoring matrix</topic><topic>Position-Specific Scoring Matrices</topic><topic>Protein fold classification</topic><topic>Protein Folding</topic><topic>Protein Structure, Tertiary</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Proteins - classification</topic><topic>Proteins - genetics</topic><topic>Sequence based feature</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kavousi, Kaveh</creatorcontrib><creatorcontrib>Moshiri, Behzad</creatorcontrib><creatorcontrib>Sadeghi, Mehdi</creatorcontrib><creatorcontrib>Araabi, Babak N.</creatorcontrib><creatorcontrib>Moosavi-Movahedi, Ali Akbar</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computational biology and chemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kavousi, Kaveh</au><au>Moshiri, Behzad</au><au>Sadeghi, Mehdi</au><au>Araabi, Babak N.</au><au>Moosavi-Movahedi, Ali Akbar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</atitle><jtitle>Computational biology and chemistry</jtitle><addtitle>Comput Biol Chem</addtitle><date>2011-02-01</date><risdate>2011</risdate><volume>35</volume><issue>1</issue><spage>1</spage><epage>9</epage><pages>1-9</pages><issn>1476-9271</issn><eissn>1476-928X</eissn><abstract>.
[Display omitted]
▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination.
Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>21216672</pmid><doi>10.1016/j.compbiolchem.2010.12.001</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1476-9271 |
ispartof | Computational biology and chemistry, 2011-02, Vol.35 (1), p.1-9 |
issn | 1476-9271 1476-928X |
language | eng |
recordid | cdi_proquest_miscellaneous_869832210 |
source | MEDLINE; Elsevier ScienceDirect Journals |
subjects | Amino Acid Sequence Biology Classifiers Combined classifier Computer Simulation Fuzzy Fuzzy set theory Information content Mathematical analysis Molecular Sequence Data Operators Position specific scoring matrix Position-Specific Scoring Matrices Protein fold classification Protein Folding Protein Structure, Tertiary Proteins Proteins - chemistry Proteins - classification Proteins - genetics Sequence based feature Vectors (mathematics) |
title | A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T12%3A03%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20protein%20fold%20classifier%20formed%20by%20fusing%20different%20modes%20of%20pseudo%20amino%20acid%20composition%20via%20PSSM&rft.jtitle=Computational%20biology%20and%20chemistry&rft.au=Kavousi,%20Kaveh&rft.date=2011-02-01&rft.volume=35&rft.issue=1&rft.spage=1&rft.epage=9&rft.pages=1-9&rft.issn=1476-9271&rft.eissn=1476-928X&rft_id=info:doi/10.1016/j.compbiolchem.2010.12.001&rft_dat=%3Cproquest_cross%3E855202910%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=855202910&rft_id=info:pmid/21216672&rft_els_id=S1476927110001106&rfr_iscdi=true |