Binary coding, mRNA information and protein structure

We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:26th International Conference on Information Technology Interfaces, 2004 2004, 2004-01, Vol.12 (2), p.53-61 Vol.1
Hauptverfasser: Stambuk, N., Konjevoda, P., Gotovac, N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 61 Vol.1
container_issue 2
container_start_page 53
container_title 26th International Conference on Information Technology Interfaces, 2004
container_volume 12
creator Stambuk, N.
Konjevoda, P.
Gotovac, N.
description We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%->90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure
doi_str_mv 10.2498/cit.2004.02.02
format Article
fullrecord <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_hrcak_primary_oai_hrcak_srce_hr_44722</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1372374</ieee_id><sourcerecordid>57582582</sourcerecordid><originalsourceid>FETCH-LOGICAL-c387t-bdfea833a7895c75481a7707a781f257aa77979438ead22ba29d3cd58a87774c3</originalsourceid><addsrcrecordid>eNqFkMtPxCAQxomPxPVx9eKlFz3ZlTLQgeNqfCVGE6NnMlKq6G6r0D3438umxj1KvgSG-c2XycfYYcWnQhp95sIwFZzLKRdZG2xSaVmXYLjeZLtGgamxNqbayg0AXlYV1DvsIKV3ng8YJThOmDoPHcXvwvVN6F5Pi8Xj_awIXdvHBQ2h7wrqmuIz9oMPXZGGuHTDMvp9tt3SPPmD33uPPV9dPl3clHcP17cXs7vSgcahfGlaTxqAUBvlUEldESLHXFetUEi5MmgkaE-NEC8kTAOuUZo0IkoHe-x09H2Ljj7sZwyLvKztKdjxJ0Xn89NKiUJk_GTE88JfS58GuwjJ-fmcOt8vk1WotMjK4HQEXexTir79s664XYVrc7h2Fa7lIisPHP86U3I0byN1LqT1VA214SD_57JlXZvMHY1c8N6v24ACUMIPXEOMZQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>57582582</pqid></control><display><type>article</type><title>Binary coding, mRNA information and protein structure</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Stambuk, N. ; Konjevoda, P. ; Gotovac, N.</creator><creatorcontrib>Stambuk, N. ; Konjevoda, P. ; Gotovac, N.</creatorcontrib><description>We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%-&gt;90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure</description><identifier>ISSN: 1330-1136</identifier><identifier>ISBN: 9539676991</identifier><identifier>ISBN: 9789539676993</identifier><identifier>EISSN: 1846-3908</identifier><identifier>DOI: 10.2498/cit.2004.02.02</identifier><identifier>CODEN: CJCTEM</identifier><language>eng</language><publisher>Zagreb: IEEE</publisher><subject>Amino acids ; Analytical biochemistry: general aspects, technics, instrumentation ; Analytical, structural and metabolic biochemistry ; Applied sciences ; Biological and medical sciences ; Chemical structures ; Classification tree analysis ; Computer applications ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; DNA ; Exact sciences and technology ; Fundamental and applied biological sciences. Psychology ; Genetics ; Information systems. Data bases ; Memory organisation. Data processing ; Microcomputers ; Prediction algorithms ; Proteins ; RNA ; Sequences ; Software ; Testing</subject><ispartof>26th International Conference on Information Technology Interfaces, 2004, 2004-01, Vol.12 (2), p.53-61 Vol.1</ispartof><rights>2004 INIST-CNRS</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c387t-bdfea833a7895c75481a7707a781f257aa77979438ead22ba29d3cd58a87774c3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1372374$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,314,780,784,789,790,885,2058,27924,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/1372374$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=16004669$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=16369034$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Stambuk, N.</creatorcontrib><creatorcontrib>Konjevoda, P.</creatorcontrib><creatorcontrib>Gotovac, N.</creatorcontrib><title>Binary coding, mRNA information and protein structure</title><title>26th International Conference on Information Technology Interfaces, 2004</title><addtitle>ITI</addtitle><description>We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%-&gt;90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure</description><subject>Amino acids</subject><subject>Analytical biochemistry: general aspects, technics, instrumentation</subject><subject>Analytical, structural and metabolic biochemistry</subject><subject>Applied sciences</subject><subject>Biological and medical sciences</subject><subject>Chemical structures</subject><subject>Classification tree analysis</subject><subject>Computer applications</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>DNA</subject><subject>Exact sciences and technology</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Genetics</subject><subject>Information systems. Data bases</subject><subject>Memory organisation. Data processing</subject><subject>Microcomputers</subject><subject>Prediction algorithms</subject><subject>Proteins</subject><subject>RNA</subject><subject>Sequences</subject><subject>Software</subject><subject>Testing</subject><issn>1330-1136</issn><issn>1846-3908</issn><isbn>9539676991</isbn><isbn>9789539676993</isbn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNqFkMtPxCAQxomPxPVx9eKlFz3ZlTLQgeNqfCVGE6NnMlKq6G6r0D3438umxj1KvgSG-c2XycfYYcWnQhp95sIwFZzLKRdZG2xSaVmXYLjeZLtGgamxNqbayg0AXlYV1DvsIKV3ng8YJThOmDoPHcXvwvVN6F5Pi8Xj_awIXdvHBQ2h7wrqmuIz9oMPXZGGuHTDMvp9tt3SPPmD33uPPV9dPl3clHcP17cXs7vSgcahfGlaTxqAUBvlUEldESLHXFetUEi5MmgkaE-NEC8kTAOuUZo0IkoHe-x09H2Ljj7sZwyLvKztKdjxJ0Xn89NKiUJk_GTE88JfS58GuwjJ-fmcOt8vk1WotMjK4HQEXexTir79s664XYVrc7h2Fa7lIisPHP86U3I0byN1LqT1VA214SD_57JlXZvMHY1c8N6v24ACUMIPXEOMZQ</recordid><startdate>20040101</startdate><enddate>20040101</enddate><creator>Stambuk, N.</creator><creator>Konjevoda, P.</creator><creator>Gotovac, N.</creator><general>IEEE</general><general>University Computing Centre</general><general>Fakultet elektrotehnike i računarstva Sveučilišta u Zagrebu</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>E3H</scope><scope>F2A</scope><scope>VP8</scope></search><sort><creationdate>20040101</creationdate><title>Binary coding, mRNA information and protein structure</title><author>Stambuk, N. ; Konjevoda, P. ; Gotovac, N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c387t-bdfea833a7895c75481a7707a781f257aa77979438ead22ba29d3cd58a87774c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Amino acids</topic><topic>Analytical biochemistry: general aspects, technics, instrumentation</topic><topic>Analytical, structural and metabolic biochemistry</topic><topic>Applied sciences</topic><topic>Biological and medical sciences</topic><topic>Chemical structures</topic><topic>Classification tree analysis</topic><topic>Computer applications</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>DNA</topic><topic>Exact sciences and technology</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Genetics</topic><topic>Information systems. Data bases</topic><topic>Memory organisation. Data processing</topic><topic>Microcomputers</topic><topic>Prediction algorithms</topic><topic>Proteins</topic><topic>RNA</topic><topic>Sequences</topic><topic>Software</topic><topic>Testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Stambuk, N.</creatorcontrib><creatorcontrib>Konjevoda, P.</creatorcontrib><creatorcontrib>Gotovac, N.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>Hrcak: Portal of scientific journals of Croatia</collection><jtitle>26th International Conference on Information Technology Interfaces, 2004</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Stambuk, N.</au><au>Konjevoda, P.</au><au>Gotovac, N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Binary coding, mRNA information and protein structure</atitle><jtitle>26th International Conference on Information Technology Interfaces, 2004</jtitle><stitle>ITI</stitle><date>2004-01-01</date><risdate>2004</risdate><volume>12</volume><issue>2</issue><spage>53</spage><epage>61 Vol.1</epage><pages>53-61 Vol.1</pages><issn>1330-1136</issn><eissn>1846-3908</eissn><isbn>9539676991</isbn><isbn>9789539676993</isbn><coden>CJCTEM</coden><abstract>We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential minimal optimisation) classifier for the support vector machines and classification trees, on a dataset of 140 dissimilar protein folds. Depending on the method of testing, the overall classification accuracy was 91.43%-100% and the tenfold cross-validation result of the procedure was 83.57%-&gt;90%. Genetic code randomisation analysis based on 100,000 different codes tested for the protein fold prediction quality indicated that: a) there is a very low chance of p=2.7times10 -4 that a better code than the natural one specified by the binary coding algorithm is randomly produced, b) dipeptides represent basic protein units with respect to the natural genetic code defining of the secondary protein structure</abstract><cop>Zagreb</cop><pub>IEEE</pub><doi>10.2498/cit.2004.02.02</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1330-1136
ispartof 26th International Conference on Information Technology Interfaces, 2004, 2004-01, Vol.12 (2), p.53-61 Vol.1
issn 1330-1136
1846-3908
language eng
recordid cdi_hrcak_primary_oai_hrcak_srce_hr_44722
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Amino acids
Analytical biochemistry: general aspects, technics, instrumentation
Analytical, structural and metabolic biochemistry
Applied sciences
Biological and medical sciences
Chemical structures
Classification tree analysis
Computer applications
Computer science
control theory
systems
Computer systems and distributed systems. User interface
DNA
Exact sciences and technology
Fundamental and applied biological sciences. Psychology
Genetics
Information systems. Data bases
Memory organisation. Data processing
Microcomputers
Prediction algorithms
Proteins
RNA
Sequences
Software
Testing
title Binary coding, mRNA information and protein structure
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T13%3A46%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Binary%20coding,%20mRNA%20information%20and%20protein%20structure&rft.jtitle=26th%20International%20Conference%20on%20Information%20Technology%20Interfaces,%202004&rft.au=Stambuk,%20N.&rft.date=2004-01-01&rft.volume=12&rft.issue=2&rft.spage=53&rft.epage=61%20Vol.1&rft.pages=53-61%20Vol.1&rft.issn=1330-1136&rft.eissn=1846-3908&rft.isbn=9539676991&rft.isbn_list=9789539676993&rft.coden=CJCTEM&rft_id=info:doi/10.2498/cit.2004.02.02&rft_dat=%3Cproquest_6IE%3E57582582%3C/proquest_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=57582582&rft_id=info:pmid/&rft_ieee_id=1372374&rfr_iscdi=true