Binary coding, mRNA information and protein structure
We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and...
Gespeichert in:
Veröffentlicht in: | 26th International Conference on Information Technology Interfaces, 2004 2004, 2004-01, Vol.12 (2), p.53-61 Vol.1 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 61 Vol.1 |
---|---|
container_issue | 2 |
container_start_page | 53 |
container_title | 26th International Conference on Information Technology Interfaces, 2004 |
container_volume | 12 |
creator | Stambuk, N. Konjevoda, P. Gotovac, N. |
description | We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%->90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure |
doi_str_mv | 10.2498/cit.2004.02.02 |
format | Article |
fullrecord | <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_hrcak_primary_oai_hrcak_srce_hr_44722</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1372374</ieee_id><sourcerecordid>57582582</sourcerecordid><originalsourceid>FETCH-LOGICAL-c387t-bdfea833a7895c75481a7707a781f257aa77979438ead22ba29d3cd58a87774c3</originalsourceid><addsrcrecordid>eNqFkMtPxCAQxomPxPVx9eKlFz3ZlTLQgeNqfCVGE6NnMlKq6G6r0D3438umxj1KvgSG-c2XycfYYcWnQhp95sIwFZzLKRdZG2xSaVmXYLjeZLtGgamxNqbayg0AXlYV1DvsIKV3ng8YJThOmDoPHcXvwvVN6F5Pi8Xj_awIXdvHBQ2h7wrqmuIz9oMPXZGGuHTDMvp9tt3SPPmD33uPPV9dPl3clHcP17cXs7vSgcahfGlaTxqAUBvlUEldESLHXFetUEi5MmgkaE-NEC8kTAOuUZo0IkoHe-x09H2Ljj7sZwyLvKztKdjxJ0Xn89NKiUJk_GTE88JfS58GuwjJ-fmcOt8vk1WotMjK4HQEXexTir79s664XYVrc7h2Fa7lIisPHP86U3I0byN1LqT1VA214SD_57JlXZvMHY1c8N6v24ACUMIPXEOMZQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>57582582</pqid></control><display><type>article</type><title>Binary coding, mRNA information and protein structure</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Stambuk, N. ; Konjevoda, P. ; Gotovac, N.</creator><creatorcontrib>Stambuk, N. ; Konjevoda, P. ; Gotovac, N.</creatorcontrib><description>We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%->90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure</description><identifier>ISSN: 1330-1136</identifier><identifier>ISBN: 9539676991</identifier><identifier>ISBN: 9789539676993</identifier><identifier>EISSN: 1846-3908</identifier><identifier>DOI: 10.2498/cit.2004.02.02</identifier><identifier>CODEN: CJCTEM</identifier><language>eng</language><publisher>Zagreb: IEEE</publisher><subject>Amino acids ; Analytical biochemistry: general aspects, technics, instrumentation ; Analytical, structural and metabolic biochemistry ; Applied sciences ; Biological and medical sciences ; Chemical structures ; Classification tree analysis ; Computer applications ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; DNA ; Exact sciences and technology ; Fundamental and applied biological sciences. Psychology ; Genetics ; Information systems. Data bases ; Memory organisation. Data processing ; Microcomputers ; Prediction algorithms ; Proteins ; RNA ; Sequences ; Software ; Testing</subject><ispartof>26th International Conference on Information Technology Interfaces, 2004, 2004-01, Vol.12 (2), p.53-61 Vol.1</ispartof><rights>2004 INIST-CNRS</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c387t-bdfea833a7895c75481a7707a781f257aa77979438ead22ba29d3cd58a87774c3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1372374$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,314,780,784,789,790,885,2058,27924,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1372374$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16004669$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16369034$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Stambuk, N.</creatorcontrib><creatorcontrib>Konjevoda, P.</creatorcontrib><creatorcontrib>Gotovac, N.</creatorcontrib><title>Binary coding, mRNA information and protein structure</title><title>26th International Conference on Information Technology Interfaces, 2004</title><addtitle>ITI</addtitle><description>We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%->90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure</description><subject>Amino acids</subject><subject>Analytical biochemistry: general aspects, technics, instrumentation</subject><subject>Analytical, structural and metabolic biochemistry</subject><subject>Applied sciences</subject><subject>Biological and medical sciences</subject><subject>Chemical structures</subject><subject>Classification tree analysis</subject><subject>Computer applications</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>DNA</subject><subject>Exact sciences and technology</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Genetics</subject><subject>Information systems. Data bases</subject><subject>Memory organisation. Data processing</subject><subject>Microcomputers</subject><subject>Prediction algorithms</subject><subject>Proteins</subject><subject>RNA</subject><subject>Sequences</subject><subject>Software</subject><subject>Testing</subject><issn>1330-1136</issn><issn>1846-3908</issn><isbn>9539676991</isbn><isbn>9789539676993</isbn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNqFkMtPxCAQxomPxPVx9eKlFz3ZlTLQgeNqfCVGE6NnMlKq6G6r0D3438umxj1KvgSG-c2XycfYYcWnQhp95sIwFZzLKRdZG2xSaVmXYLjeZLtGgamxNqbayg0AXlYV1DvsIKV3ng8YJThOmDoPHcXvwvVN6F5Pi8Xj_awIXdvHBQ2h7wrqmuIz9oMPXZGGuHTDMvp9tt3SPPmD33uPPV9dPl3clHcP17cXs7vSgcahfGlaTxqAUBvlUEldESLHXFetUEi5MmgkaE-NEC8kTAOuUZo0IkoHe-x09H2Ljj7sZwyLvKztKdjxJ0Xn89NKiUJk_GTE88JfS58GuwjJ-fmcOt8vk1WotMjK4HQEXexTir79s664XYVrc7h2Fa7lIisPHP86U3I0byN1LqT1VA214SD_57JlXZvMHY1c8N6v24ACUMIPXEOMZQ</recordid><startdate>20040101</startdate><enddate>20040101</enddate><creator>Stambuk, N.</creator><creator>Konjevoda, P.</creator><creator>Gotovac, N.</creator><general>IEEE</general><general>University Computing Centre</general><general>Fakultet elektrotehnike i računarstva Sveučilišta u Zagrebu</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><scope>VP8</scope></search><sort><creationdate>20040101</creationdate><title>Binary coding, mRNA information and protein structure</title><author>Stambuk, N. ; Konjevoda, P. ; Gotovac, N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c387t-bdfea833a7895c75481a7707a781f257aa77979438ead22ba29d3cd58a87774c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Amino acids</topic><topic>Analytical biochemistry: general aspects, technics, instrumentation</topic><topic>Analytical, structural and metabolic biochemistry</topic><topic>Applied sciences</topic><topic>Biological and medical sciences</topic><topic>Chemical structures</topic><topic>Classification tree analysis</topic><topic>Computer applications</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>DNA</topic><topic>Exact sciences and technology</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Genetics</topic><topic>Information systems. Data bases</topic><topic>Memory organisation. Data processing</topic><topic>Microcomputers</topic><topic>Prediction algorithms</topic><topic>Proteins</topic><topic>RNA</topic><topic>Sequences</topic><topic>Software</topic><topic>Testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Stambuk, N.</creatorcontrib><creatorcontrib>Konjevoda, P.</creatorcontrib><creatorcontrib>Gotovac, N.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Hrcak: Portal of scientific journals of Croatia</collection><jtitle>26th International Conference on Information Technology Interfaces, 2004</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Stambuk, N.</au><au>Konjevoda, P.</au><au>Gotovac, N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Binary coding, mRNA information and protein structure</atitle><jtitle>26th International Conference on Information Technology Interfaces, 2004</jtitle><stitle>ITI</stitle><date>2004-01-01</date><risdate>2004</risdate><volume>12</volume><issue>2</issue><spage>53</spage><epage>61 Vol.1</epage><pages>53-61 Vol.1</pages><issn>1330-1136</issn><eissn>1846-3908</eissn><isbn>9539676991</isbn><isbn>9789539676993</isbn><coden>CJCTEM</coden><abstract>We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%->90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure</abstract><cop>Zagreb</cop><pub>IEEE</pub><doi>10.2498/cit.2004.02.02</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1330-1136 |
ispartof | 26th International Conference on Information Technology Interfaces, 2004, 2004-01, Vol.12 (2), p.53-61 Vol.1 |
issn | 1330-1136 1846-3908 |
language | eng |
recordid | cdi_hrcak_primary_oai_hrcak_srce_hr_44722 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Amino acids Analytical biochemistry: general aspects, technics, instrumentation Analytical, structural and metabolic biochemistry Applied sciences Biological and medical sciences Chemical structures Classification tree analysis Computer applications Computer science control theory systems Computer systems and distributed systems. User interface DNA Exact sciences and technology Fundamental and applied biological sciences. Psychology Genetics Information systems. Data bases Memory organisation. Data processing Microcomputers Prediction algorithms Proteins RNA Sequences Software Testing |
title | Binary coding, mRNA information and protein structure |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T13%3A46%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Binary%20coding,%20mRNA%20information%20and%20protein%20structure&rft.jtitle=26th%20International%20Conference%20on%20Information%20Technology%20Interfaces,%202004&rft.au=Stambuk,%20N.&rft.date=2004-01-01&rft.volume=12&rft.issue=2&rft.spage=53&rft.epage=61%20Vol.1&rft.pages=53-61%20Vol.1&rft.issn=1330-1136&rft.eissn=1846-3908&rft.isbn=9539676991&rft.isbn_list=9789539676993&rft.coden=CJCTEM&rft_id=info:doi/10.2498/cit.2004.02.02&rft_dat=%3Cproquest_6IE%3E57582582%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=57582582&rft_id=info:pmid/&rft_ieee_id=1372374&rfr_iscdi=true |