A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM

. [Display omitted] ▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Resul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational biology and chemistry 2011-02, Vol.35 (1), p.1-9
Hauptverfasser: Kavousi, Kaveh, Moshiri, Behzad, Sadeghi, Mehdi, Araabi, Babak N., Moosavi-Movahedi, Ali Akbar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 9
container_issue 1
container_start_page 1
container_title Computational biology and chemistry
container_volume 35
creator Kavousi, Kaveh
Moshiri, Behzad
Sadeghi, Mehdi
Araabi, Babak N.
Moosavi-Movahedi, Ali Akbar
description . [Display omitted] ▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination. Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.
doi_str_mv 10.1016/j.compbiolchem.2010.12.001
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_869832210</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1476927110001106</els_id><sourcerecordid>855202910</sourcerecordid><originalsourceid>FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</originalsourceid><addsrcrecordid>eNqNkUtLAzEUhYMoVqt_QYIbV61JppOZcSe-QVGogruQx42mzExqMlPovzeltbjTTR7ku_fknoPQKSVjSig_n421b-bK-Vp_QjNmZPXAxoTQHXRAJwUfVax8392eCzpAhzHOCGEZIfk-GjDKKOcFO0DmEs-D78C12PraYF3LGJ11ENI9NGCwWmLbR9d-YOOshQBthxtvIGJv8TxCbzyWjWvTql1qkL7mo-ucb_HCSfwynT4doT0r6wjHm32I3m5vXq_uR4_Pdw9Xl48jPaG0GxWaEC2t4UwCV0XJTCkrVlHIpWY0p3mmraqIqiRXKlekzFRZ6Ak3nKch-SQborN13zTSVw-xE42LGupatuD7KEpelRljlPxN5jkjSXpFXqxJHXyMAayYB9fIsBSUiFUcYiZ-xyFWcQjKRIojFZ9sZHqVvNyW_vifgOs1AMmWRXJdRO2g1WBcAN0J491_dL4BZWmi1g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>855202910</pqid></control><display><type>article</type><title>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Kavousi, Kaveh ; Moshiri, Behzad ; Sadeghi, Mehdi ; Araabi, Babak N. ; Moosavi-Movahedi, Ali Akbar</creator><creatorcontrib>Kavousi, Kaveh ; Moshiri, Behzad ; Sadeghi, Mehdi ; Araabi, Babak N. ; Moosavi-Movahedi, Ali Akbar</creatorcontrib><description>. [Display omitted] ▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination. Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.</description><identifier>ISSN: 1476-9271</identifier><identifier>EISSN: 1476-928X</identifier><identifier>DOI: 10.1016/j.compbiolchem.2010.12.001</identifier><identifier>PMID: 21216672</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Amino Acid Sequence ; Biology ; Classifiers ; Combined classifier ; Computer Simulation ; Fuzzy ; Fuzzy set theory ; Information content ; Mathematical analysis ; Molecular Sequence Data ; Operators ; Position specific scoring matrix ; Position-Specific Scoring Matrices ; Protein fold classification ; Protein Folding ; Protein Structure, Tertiary ; Proteins ; Proteins - chemistry ; Proteins - classification ; Proteins - genetics ; Sequence based feature ; Vectors (mathematics)</subject><ispartof>Computational biology and chemistry, 2011-02, Vol.35 (1), p.1-9</ispartof><rights>2010 Elsevier Ltd</rights><rights>Copyright © 2010 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</citedby><cites>FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S1476927110001106$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3536,27903,27904,65309</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/21216672$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kavousi, Kaveh</creatorcontrib><creatorcontrib>Moshiri, Behzad</creatorcontrib><creatorcontrib>Sadeghi, Mehdi</creatorcontrib><creatorcontrib>Araabi, Babak N.</creatorcontrib><creatorcontrib>Moosavi-Movahedi, Ali Akbar</creatorcontrib><title>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</title><title>Computational biology and chemistry</title><addtitle>Comput Biol Chem</addtitle><description>. [Display omitted] ▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination. Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.</description><subject>Amino Acid Sequence</subject><subject>Biology</subject><subject>Classifiers</subject><subject>Combined classifier</subject><subject>Computer Simulation</subject><subject>Fuzzy</subject><subject>Fuzzy set theory</subject><subject>Information content</subject><subject>Mathematical analysis</subject><subject>Molecular Sequence Data</subject><subject>Operators</subject><subject>Position specific scoring matrix</subject><subject>Position-Specific Scoring Matrices</subject><subject>Protein fold classification</subject><subject>Protein Folding</subject><subject>Protein Structure, Tertiary</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Proteins - classification</subject><subject>Proteins - genetics</subject><subject>Sequence based feature</subject><subject>Vectors (mathematics)</subject><issn>1476-9271</issn><issn>1476-928X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkUtLAzEUhYMoVqt_QYIbV61JppOZcSe-QVGogruQx42mzExqMlPovzeltbjTTR7ku_fknoPQKSVjSig_n421b-bK-Vp_QjNmZPXAxoTQHXRAJwUfVax8392eCzpAhzHOCGEZIfk-GjDKKOcFO0DmEs-D78C12PraYF3LGJ11ENI9NGCwWmLbR9d-YOOshQBthxtvIGJv8TxCbzyWjWvTql1qkL7mo-ucb_HCSfwynT4doT0r6wjHm32I3m5vXq_uR4_Pdw9Xl48jPaG0GxWaEC2t4UwCV0XJTCkrVlHIpWY0p3mmraqIqiRXKlekzFRZ6Ak3nKch-SQborN13zTSVw-xE42LGupatuD7KEpelRljlPxN5jkjSXpFXqxJHXyMAayYB9fIsBSUiFUcYiZ-xyFWcQjKRIojFZ9sZHqVvNyW_vifgOs1AMmWRXJdRO2g1WBcAN0J491_dL4BZWmi1g</recordid><startdate>20110201</startdate><enddate>20110201</enddate><creator>Kavousi, Kaveh</creator><creator>Moshiri, Behzad</creator><creator>Sadeghi, Mehdi</creator><creator>Araabi, Babak N.</creator><creator>Moosavi-Movahedi, Ali Akbar</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20110201</creationdate><title>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</title><author>Kavousi, Kaveh ; Moshiri, Behzad ; Sadeghi, Mehdi ; Araabi, Babak N. ; Moosavi-Movahedi, Ali Akbar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c411t-7c00cafd62ae6b782d8a9291e5ac215153cfb90b9a6bb5b083b87c46d66927643</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Amino Acid Sequence</topic><topic>Biology</topic><topic>Classifiers</topic><topic>Combined classifier</topic><topic>Computer Simulation</topic><topic>Fuzzy</topic><topic>Fuzzy set theory</topic><topic>Information content</topic><topic>Mathematical analysis</topic><topic>Molecular Sequence Data</topic><topic>Operators</topic><topic>Position specific scoring matrix</topic><topic>Position-Specific Scoring Matrices</topic><topic>Protein fold classification</topic><topic>Protein Folding</topic><topic>Protein Structure, Tertiary</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Proteins - classification</topic><topic>Proteins - genetics</topic><topic>Sequence based feature</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kavousi, Kaveh</creatorcontrib><creatorcontrib>Moshiri, Behzad</creatorcontrib><creatorcontrib>Sadeghi, Mehdi</creatorcontrib><creatorcontrib>Araabi, Babak N.</creatorcontrib><creatorcontrib>Moosavi-Movahedi, Ali Akbar</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Computational biology and chemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kavousi, Kaveh</au><au>Moshiri, Behzad</au><au>Sadeghi, Mehdi</au><au>Araabi, Babak N.</au><au>Moosavi-Movahedi, Ali Akbar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM</atitle><jtitle>Computational biology and chemistry</jtitle><addtitle>Comput Biol Chem</addtitle><date>2011-02-01</date><risdate>2011</risdate><volume>35</volume><issue>1</issue><spage>1</spage><epage>9</epage><pages>1-9</pages><issn>1476-9271</issn><eissn>1476-928X</eissn><abstract>. [Display omitted] ▶ In this study we use combined classifier for identification of protein domain fold. ▶ Information content of extracted features of protein has been introduced to face this problem. ▶ We show that position specific scoring matrix improves the correct classification rate. ▶ Results provide deeper interpretation about the effectiveness of each feature for discrimination. Protein function is related to its chemical reaction to the surrounding environment including other proteins. On the other hand, this depends on the spatial shape and tertiary structure of protein and folding of its constituent components in space. The correct identification of protein domain fold solely using extracted information from protein sequence is a complicated and controversial task in the current computational biology. In this article a combined classifier based on the information content of extracted features from the primary structure of protein has been introduced to face this challenging problem. In the first stage of our proposed two-tier architecture, there are several classifiers each of which is trained with a different sequence based feature vector. Apart from the application of the predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, and different dimensions of pseudo-amino acid composition vectors in similar studies, the position specific scoring matrix (PSSM) has also been used to improve the correct classification rate (CCR) in this study. Using K-fold cross validation on training dataset related to 27 famous folds of SCOP, the 28 dimensional probability output vector from each evidence theoretic K-NN classifier is used to determine the information content or expertness of corresponding feature for discrimination in each fold class. In the second stage, the outputs of classifiers for test dataset are fused using Sugeno fuzzy integral operator to make better decision for target fold class. The expertness factor of each classifier in each fold class has been used to calculate the fuzzy integral operator weights. Results make it possible to provide deeper interpretation about the effectiveness of each feature for discrimination in target classes for query proteins.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>21216672</pmid><doi>10.1016/j.compbiolchem.2010.12.001</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1476-9271
ispartof Computational biology and chemistry, 2011-02, Vol.35 (1), p.1-9
issn 1476-9271
1476-928X
language eng
recordid cdi_proquest_miscellaneous_869832210
source MEDLINE; Elsevier ScienceDirect Journals
subjects Amino Acid Sequence
Biology
Classifiers
Combined classifier
Computer Simulation
Fuzzy
Fuzzy set theory
Information content
Mathematical analysis
Molecular Sequence Data
Operators
Position specific scoring matrix
Position-Specific Scoring Matrices
Protein fold classification
Protein Folding
Protein Structure, Tertiary
Proteins
Proteins - chemistry
Proteins - classification
Proteins - genetics
Sequence based feature
Vectors (mathematics)
title A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T12%3A03%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20protein%20fold%20classifier%20formed%20by%20fusing%20different%20modes%20of%20pseudo%20amino%20acid%20composition%20via%20PSSM&rft.jtitle=Computational%20biology%20and%20chemistry&rft.au=Kavousi,%20Kaveh&rft.date=2011-02-01&rft.volume=35&rft.issue=1&rft.spage=1&rft.epage=9&rft.pages=1-9&rft.issn=1476-9271&rft.eissn=1476-928X&rft_id=info:doi/10.1016/j.compbiolchem.2010.12.001&rft_dat=%3Cproquest_cross%3E855202910%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=855202910&rft_id=info:pmid/21216672&rft_els_id=S1476927110001106&rfr_iscdi=true