Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve
The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accura...
Gespeichert in:
Veröffentlicht in: | Nucleic acids research 2000-07, Vol.28 (14), p.2804-2814 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2814 |
---|---|
container_issue | 14 |
container_start_page | 2804 |
container_title | Nucleic acids research |
container_volume | 28 |
creator | Zhang, C T Wang, J |
description | The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending e-mail to the corresponding author. |
doi_str_mv | 10.1093/nar/28.14.2804 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_102655</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>17556282</sourcerecordid><originalsourceid>FETCH-LOGICAL-c509t-556b57fbd859a5a3446dea40f092f9a4f538dc4c769744d645f3aa9321c4135f3</originalsourceid><addsrcrecordid>eNqFkctrGzEQxkVpadK01x6LKLQ3O3qMdleHHkroCwKB0F56EbPaWXuDLbmSNuD_PjIOxe0lJ83j90kz-hh7K8VSCqsvA6ZL1S0lLFUn4Bk7l7pRC7CNen4Sn7FXOd8JIUEaeMnOqlR0Wttztr4lH1dhKlMMPI58l2KhKXAfhyms-IoCZV7zsia-J8zlUIpb4lh4T6VQqi0M3JoPHL2fE_o97zHTwONR9ZvX6j29Zi9G3GR683hesF9fv_y8-r64vvn24-rz9cIbYcvCmKY37dgPnbFoUAM0AyGIUVg1WoTR6G7w4NvGtgBDA2bUiFYr6UHqmlywT8d7d3O_pcFTKAk3bpemLaa9izi5fzthWrtVvHdSqMaYqv_4qE_xz0y5uO2UPW02GCjO2bVSQQOgnwRlW3dRnarg-__AuzinUD_BKSFMo6U6PLs8Qj7FnBONfyeWwh2cdtVppzonwR2croJ3p3ue4Edr9QMi3qRl</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200563125</pqid></control><display><type>article</type><title>Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve</title><source>MEDLINE</source><source>Oxford Journals Open Access Collection</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Zhang, C T ; Wang, J</creator><creatorcontrib>Zhang, C T ; Wang, J</creatorcontrib><description>The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is </=5645, significantly smaller than the 5800-6000 which is widely accepted, and much larger than the 4800 estimated by another group recently. The mitochondrial genes were not included into the above estimate. A codingness index called the YZ score (YZ OE [0,1]) is proposed to recognize protein coding genes in the yeast genome. Among the ORFs annotated in the MIPS (Munich Information Centre for Protein Sequences) database, those recognized as non-coding by the present algorithm are listed in this paper in detail. The criterion for a coding or non-coding ORF is simply decided by YZ > 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending e-mail to the corresponding author.</description><identifier>ISSN: 1362-4962</identifier><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/28.14.2804</identifier><identifier>PMID: 10908339</identifier><identifier>CODEN: NARHAD</identifier><language>eng</language><publisher>England: Oxford Publishing Limited (England)</publisher><subject>Algorithms ; Databases as Topic ; DNA, Fungal ; Fungal Proteins - genetics ; Genes, Fungal - genetics ; Genome, Fungal ; Open Reading Frames ; Reproducibility of Results ; Saccharomyces cerevisiae ; Saccharomyces cerevisiae - genetics</subject><ispartof>Nucleic acids research, 2000-07, Vol.28 (14), p.2804-2814</ispartof><rights>Copyright Oxford University Press(England) Jul 15, 2000</rights><rights>Copyright © 2000 Oxford University Press 2000</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c509t-556b57fbd859a5a3446dea40f092f9a4f538dc4c769744d645f3aa9321c4135f3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC102655/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC102655/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,723,776,780,881,27903,27904,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/10908339$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, C T</creatorcontrib><creatorcontrib>Wang, J</creatorcontrib><title>Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is </=5645, significantly smaller than the 5800-6000 which is widely accepted, and much larger than the 4800 estimated by another group recently. The mitochondrial genes were not included into the above estimate. A codingness index called the YZ score (YZ OE [0,1]) is proposed to recognize protein coding genes in the yeast genome. Among the ORFs annotated in the MIPS (Munich Information Centre for Protein Sequences) database, those recognized as non-coding by the present algorithm are listed in this paper in detail. The criterion for a coding or non-coding ORF is simply decided by YZ > 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending e-mail to the corresponding author.</description><subject>Algorithms</subject><subject>Databases as Topic</subject><subject>DNA, Fungal</subject><subject>Fungal Proteins - genetics</subject><subject>Genes, Fungal - genetics</subject><subject>Genome, Fungal</subject><subject>Open Reading Frames</subject><subject>Reproducibility of Results</subject><subject>Saccharomyces cerevisiae</subject><subject>Saccharomyces cerevisiae - genetics</subject><issn>1362-4962</issn><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2000</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkctrGzEQxkVpadK01x6LKLQ3O3qMdleHHkroCwKB0F56EbPaWXuDLbmSNuD_PjIOxe0lJ83j90kz-hh7K8VSCqsvA6ZL1S0lLFUn4Bk7l7pRC7CNen4Sn7FXOd8JIUEaeMnOqlR0Wttztr4lH1dhKlMMPI58l2KhKXAfhyms-IoCZV7zsia-J8zlUIpb4lh4T6VQqi0M3JoPHL2fE_o97zHTwONR9ZvX6j29Zi9G3GR683hesF9fv_y8-r64vvn24-rz9cIbYcvCmKY37dgPnbFoUAM0AyGIUVg1WoTR6G7w4NvGtgBDA2bUiFYr6UHqmlywT8d7d3O_pcFTKAk3bpemLaa9izi5fzthWrtVvHdSqMaYqv_4qE_xz0y5uO2UPW02GCjO2bVSQQOgnwRlW3dRnarg-__AuzinUD_BKSFMo6U6PLs8Qj7FnBONfyeWwh2cdtVppzonwR2croJ3p3ue4Edr9QMi3qRl</recordid><startdate>20000715</startdate><enddate>20000715</enddate><creator>Zhang, C T</creator><creator>Wang, J</creator><general>Oxford Publishing Limited (England)</general><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QL</scope><scope>7QO</scope><scope>7QP</scope><scope>7QR</scope><scope>7SS</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>K9.</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20000715</creationdate><title>Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve</title><author>Zhang, C T ; Wang, J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c509t-556b57fbd859a5a3446dea40f092f9a4f538dc4c769744d645f3aa9321c4135f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2000</creationdate><topic>Algorithms</topic><topic>Databases as Topic</topic><topic>DNA, Fungal</topic><topic>Fungal Proteins - genetics</topic><topic>Genes, Fungal - genetics</topic><topic>Genome, Fungal</topic><topic>Open Reading Frames</topic><topic>Reproducibility of Results</topic><topic>Saccharomyces cerevisiae</topic><topic>Saccharomyces cerevisiae - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, C T</creatorcontrib><creatorcontrib>Wang, J</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, C T</au><au>Wang, J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2000-07-15</date><risdate>2000</risdate><volume>28</volume><issue>14</issue><spage>2804</spage><epage>2814</epage><pages>2804-2814</pages><issn>1362-4962</issn><issn>0305-1048</issn><eissn>1362-4962</eissn><coden>NARHAD</coden><abstract>The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is </=5645, significantly smaller than the 5800-6000 which is widely accepted, and much larger than the 4800 estimated by another group recently. The mitochondrial genes were not included into the above estimate. A codingness index called the YZ score (YZ OE [0,1]) is proposed to recognize protein coding genes in the yeast genome. Among the ORFs annotated in the MIPS (Munich Information Centre for Protein Sequences) database, those recognized as non-coding by the present algorithm are listed in this paper in detail. The criterion for a coding or non-coding ORF is simply decided by YZ > 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending e-mail to the corresponding author.</abstract><cop>England</cop><pub>Oxford Publishing Limited (England)</pub><pmid>10908339</pmid><doi>10.1093/nar/28.14.2804</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1362-4962 |
ispartof | Nucleic acids research, 2000-07, Vol.28 (14), p.2804-2814 |
issn | 1362-4962 0305-1048 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_102655 |
source | MEDLINE; Oxford Journals Open Access Collection; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Algorithms Databases as Topic DNA, Fungal Fungal Proteins - genetics Genes, Fungal - genetics Genome, Fungal Open Reading Frames Reproducibility of Results Saccharomyces cerevisiae Saccharomyces cerevisiae - genetics |
title | Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T11%3A52%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recognition%20of%20protein%20coding%20genes%20in%20the%20yeast%20genome%20at%20better%20than%2095%25%20accuracy%20based%20on%20the%20Z%20curve&rft.jtitle=Nucleic%20acids%20research&rft.au=Zhang,%20C%20T&rft.date=2000-07-15&rft.volume=28&rft.issue=14&rft.spage=2804&rft.epage=2814&rft.pages=2804-2814&rft.issn=1362-4962&rft.eissn=1362-4962&rft.coden=NARHAD&rft_id=info:doi/10.1093/nar/28.14.2804&rft_dat=%3Cproquest_pubme%3E17556282%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=200563125&rft_id=info:pmid/10908339&rfr_iscdi=true |