Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli

The basic nature of the sequence features that define a promoter sequence for Escherichia coli RNA polymerase have been established by a variety of biochemical and genetic methods. We have developed rigorous analytical methods for finding unknown patterns that occur imperfectly in a set of several s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of molecular biology 1985-01, Vol.186 (1), p.117-128
Hauptverfasser: Galas, David J., Eggert, Mark, Waterman, Michael S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 128
container_issue 1
container_start_page 117
container_title Journal of molecular biology
container_volume 186
creator Galas, David J.
Eggert, Mark
Waterman, Michael S.
description The basic nature of the sequence features that define a promoter sequence for Escherichia coli RNA polymerase have been established by a variety of biochemical and genetic methods. We have developed rigorous analytical methods for finding unknown patterns that occur imperfectly in a set of several sequences, and have used them to examine a set of bacterial promoters. The algorithm easily discovers the “consensus” sequences for the −10 and −35 regions, which are essentially identical to the results of previous analyses, but requires no prior assumptions about the common patterns. By explicitly specifying the nature of the search for consensus sequences, we give a rigorous definition to this concept that should be widely applicable. We also have provided estimates for the statistical significance of common patterns discovered in sets of sequences. In addition to providing a rigorous basis for defining known consensus regions, we have found additional features in these promoters that may have functional significance. These added features were located on either side of the −35 region. The pattern 5′, or upstream, from the −35 region was found using the standard alphabet (A, G, C and T), but the pattern between the −10 and the −35 regions was detectable only in a sub-alphabet. Recent results relating DNA sequence to helix conformation suggest that the former (upstream) pattern may have a functional significance. Possible roles in promoter function are discussed in this light, and an observation of altered promoter function involving the upstream region is reported that appears to support the suggestion of function in at least one case.
doi_str_mv 10.1016/0022-2836(85)90262-1
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_miscellaneous_76532871</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>0022283685902621</els_id><sourcerecordid>14338478</sourcerecordid><originalsourceid>FETCH-LOGICAL-e345t-d1e5596ccfbc3d01f2125f0233f59aeec221d5ce5aaf41b378db26a4b505d8ca3</originalsourceid><addsrcrecordid>eNqFkkFv1DAQhS1EVZbCPwDhA0L0EBjbceJwQFqVtiBVIAE9W4493jVK4q2dReq_x2FX5cjJkt_n8cx7Q8gLBu8YsOY9AOcVV6J5q-R5B7zhFXtEVgxUV6lGqMdk9YA8IU9z_gUAUtTqlJyKDlSjuhUZvodNTHGf6c7MM6apSmjjZgpziBMdcd5Gl6mPiX76uqYZ7_Y4Wcwf6Hoyw30OmUZPdymOsTz-p1NfruhltltMwW6DoTYO4Rk58WbI-Px4npHbq8ufF5-rm2_XXy7WNxWKWs6VYyhl11jreyscMM8Zlx64EF52BtFyzpy0KI3xNetFq1zPG1P3EqRT1ogz8uZQtzRWGsqzHkO2OAxmwjKqbhspuGrZf0FWC6HqVhXw5RHc9yM6vUthNOleH30s-uujbrI1g09msiE_YEryuhML9uqAeRO12aSC3P7gwASwVgrolo8-Hggs_vwOmHS2YbHUhRLMrF0MmoFeFkAv6eol3VJf_10AzcQfE_Og-g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>14338478</pqid></control><display><type>article</type><title>Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Galas, David J. ; Eggert, Mark ; Waterman, Michael S.</creator><creatorcontrib>Galas, David J. ; Eggert, Mark ; Waterman, Michael S.</creatorcontrib><description>The basic nature of the sequence features that define a promoter sequence for Escherichia coli RNA polymerase have been established by a variety of biochemical and genetic methods. We have developed rigorous analytical methods for finding unknown patterns that occur imperfectly in a set of several sequences, and have used them to examine a set of bacterial promoters. The algorithm easily discovers the “consensus” sequences for the −10 and −35 regions, which are essentially identical to the results of previous analyses, but requires no prior assumptions about the common patterns. By explicitly specifying the nature of the search for consensus sequences, we give a rigorous definition to this concept that should be widely applicable. We also have provided estimates for the statistical significance of common patterns discovered in sets of sequences. In addition to providing a rigorous basis for defining known consensus regions, we have found additional features in these promoters that may have functional significance. These added features were located on either side of the −35 region. The pattern 5′, or upstream, from the −35 region was found using the standard alphabet (A, G, C and T), but the pattern between the −10 and the −35 regions was detectable only in a sub-alphabet. Recent results relating DNA sequence to helix conformation suggest that the former (upstream) pattern may have a functional significance. Possible roles in promoter function are discussed in this light, and an observation of altered promoter function involving the upstream region is reported that appears to support the suggestion of function in at least one case.</description><identifier>ISSN: 0022-2836</identifier><identifier>EISSN: 1089-8638</identifier><identifier>DOI: 10.1016/0022-2836(85)90262-1</identifier><identifier>PMID: 3908689</identifier><identifier>CODEN: JMOBAK</identifier><language>eng</language><publisher>Oxford: Elsevier Ltd</publisher><subject>algorithms ; Analytical, structural and metabolic biochemistry ; Base Sequence ; Biological and medical sciences ; Biotechnology ; computer analysis ; computer techniques ; DNA ; DNA conformation ; DNA, Bacterial - genetics ; Dna, deoxyribonucleoproteins ; Escherichia coli ; Escherichia coli - genetics ; Fundamental and applied biological sciences. Psychology ; Genetic engineering ; Genetic technics ; Methods ; Methods. Procedures. Technologies ; molecular genetics ; Nucleic acids ; nucleotide sequences ; Pattern Recognition, Automated ; promoter regions ; Promoter Regions, Genetic ; sequence alignment ; Sequence Homology, Nucleic Acid ; Synthetic digonucleotides and genes. Sequencing ; Transcription, Genetic</subject><ispartof>Journal of molecular biology, 1985-01, Vol.186 (1), p.117-128</ispartof><rights>1985</rights><rights>1986 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/0022283685902621$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3536,27903,27904,65309</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=8524939$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/3908689$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Galas, David J.</creatorcontrib><creatorcontrib>Eggert, Mark</creatorcontrib><creatorcontrib>Waterman, Michael S.</creatorcontrib><title>Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli</title><title>Journal of molecular biology</title><addtitle>J Mol Biol</addtitle><description>The basic nature of the sequence features that define a promoter sequence for Escherichia coli RNA polymerase have been established by a variety of biochemical and genetic methods. We have developed rigorous analytical methods for finding unknown patterns that occur imperfectly in a set of several sequences, and have used them to examine a set of bacterial promoters. The algorithm easily discovers the “consensus” sequences for the −10 and −35 regions, which are essentially identical to the results of previous analyses, but requires no prior assumptions about the common patterns. By explicitly specifying the nature of the search for consensus sequences, we give a rigorous definition to this concept that should be widely applicable. We also have provided estimates for the statistical significance of common patterns discovered in sets of sequences. In addition to providing a rigorous basis for defining known consensus regions, we have found additional features in these promoters that may have functional significance. These added features were located on either side of the −35 region. The pattern 5′, or upstream, from the −35 region was found using the standard alphabet (A, G, C and T), but the pattern between the −10 and the −35 regions was detectable only in a sub-alphabet. Recent results relating DNA sequence to helix conformation suggest that the former (upstream) pattern may have a functional significance. Possible roles in promoter function are discussed in this light, and an observation of altered promoter function involving the upstream region is reported that appears to support the suggestion of function in at least one case.</description><subject>algorithms</subject><subject>Analytical, structural and metabolic biochemistry</subject><subject>Base Sequence</subject><subject>Biological and medical sciences</subject><subject>Biotechnology</subject><subject>computer analysis</subject><subject>computer techniques</subject><subject>DNA</subject><subject>DNA conformation</subject><subject>DNA, Bacterial - genetics</subject><subject>Dna, deoxyribonucleoproteins</subject><subject>Escherichia coli</subject><subject>Escherichia coli - genetics</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Genetic engineering</subject><subject>Genetic technics</subject><subject>Methods</subject><subject>Methods. Procedures. Technologies</subject><subject>molecular genetics</subject><subject>Nucleic acids</subject><subject>nucleotide sequences</subject><subject>Pattern Recognition, Automated</subject><subject>promoter regions</subject><subject>Promoter Regions, Genetic</subject><subject>sequence alignment</subject><subject>Sequence Homology, Nucleic Acid</subject><subject>Synthetic digonucleotides and genes. Sequencing</subject><subject>Transcription, Genetic</subject><issn>0022-2836</issn><issn>1089-8638</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1985</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkkFv1DAQhS1EVZbCPwDhA0L0EBjbceJwQFqVtiBVIAE9W4493jVK4q2dReq_x2FX5cjJkt_n8cx7Q8gLBu8YsOY9AOcVV6J5q-R5B7zhFXtEVgxUV6lGqMdk9YA8IU9z_gUAUtTqlJyKDlSjuhUZvodNTHGf6c7MM6apSmjjZgpziBMdcd5Gl6mPiX76uqYZ7_Y4Wcwf6Hoyw30OmUZPdymOsTz-p1NfruhltltMwW6DoTYO4Rk58WbI-Px4npHbq8ufF5-rm2_XXy7WNxWKWs6VYyhl11jreyscMM8Zlx64EF52BtFyzpy0KI3xNetFq1zPG1P3EqRT1ogz8uZQtzRWGsqzHkO2OAxmwjKqbhspuGrZf0FWC6HqVhXw5RHc9yM6vUthNOleH30s-uujbrI1g09msiE_YEryuhML9uqAeRO12aSC3P7gwASwVgrolo8-Hggs_vwOmHS2YbHUhRLMrF0MmoFeFkAv6eol3VJf_10AzcQfE_Og-g</recordid><startdate>19850101</startdate><enddate>19850101</enddate><creator>Galas, David J.</creator><creator>Eggert, Mark</creator><creator>Waterman, Michael S.</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>FBQ</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>7QL</scope><scope>7TM</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>19850101</creationdate><title>Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli</title><author>Galas, David J. ; Eggert, Mark ; Waterman, Michael S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-e345t-d1e5596ccfbc3d01f2125f0233f59aeec221d5ce5aaf41b378db26a4b505d8ca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1985</creationdate><topic>algorithms</topic><topic>Analytical, structural and metabolic biochemistry</topic><topic>Base Sequence</topic><topic>Biological and medical sciences</topic><topic>Biotechnology</topic><topic>computer analysis</topic><topic>computer techniques</topic><topic>DNA</topic><topic>DNA conformation</topic><topic>DNA, Bacterial - genetics</topic><topic>Dna, deoxyribonucleoproteins</topic><topic>Escherichia coli</topic><topic>Escherichia coli - genetics</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Genetic engineering</topic><topic>Genetic technics</topic><topic>Methods</topic><topic>Methods. Procedures. Technologies</topic><topic>molecular genetics</topic><topic>Nucleic acids</topic><topic>nucleotide sequences</topic><topic>Pattern Recognition, Automated</topic><topic>promoter regions</topic><topic>Promoter Regions, Genetic</topic><topic>sequence alignment</topic><topic>Sequence Homology, Nucleic Acid</topic><topic>Synthetic digonucleotides and genes. Sequencing</topic><topic>Transcription, Genetic</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Galas, David J.</creatorcontrib><creatorcontrib>Eggert, Mark</creatorcontrib><creatorcontrib>Waterman, Michael S.</creatorcontrib><collection>AGRIS</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of molecular biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Galas, David J.</au><au>Eggert, Mark</au><au>Waterman, Michael S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli</atitle><jtitle>Journal of molecular biology</jtitle><addtitle>J Mol Biol</addtitle><date>1985-01-01</date><risdate>1985</risdate><volume>186</volume><issue>1</issue><spage>117</spage><epage>128</epage><pages>117-128</pages><issn>0022-2836</issn><eissn>1089-8638</eissn><coden>JMOBAK</coden><abstract>The basic nature of the sequence features that define a promoter sequence for Escherichia coli RNA polymerase have been established by a variety of biochemical and genetic methods. We have developed rigorous analytical methods for finding unknown patterns that occur imperfectly in a set of several sequences, and have used them to examine a set of bacterial promoters. The algorithm easily discovers the “consensus” sequences for the −10 and −35 regions, which are essentially identical to the results of previous analyses, but requires no prior assumptions about the common patterns. By explicitly specifying the nature of the search for consensus sequences, we give a rigorous definition to this concept that should be widely applicable. We also have provided estimates for the statistical significance of common patterns discovered in sets of sequences. In addition to providing a rigorous basis for defining known consensus regions, we have found additional features in these promoters that may have functional significance. These added features were located on either side of the −35 region. The pattern 5′, or upstream, from the −35 region was found using the standard alphabet (A, G, C and T), but the pattern between the −10 and the −35 regions was detectable only in a sub-alphabet. Recent results relating DNA sequence to helix conformation suggest that the former (upstream) pattern may have a functional significance. Possible roles in promoter function are discussed in this light, and an observation of altered promoter function involving the upstream region is reported that appears to support the suggestion of function in at least one case.</abstract><cop>Oxford</cop><pub>Elsevier Ltd</pub><pmid>3908689</pmid><doi>10.1016/0022-2836(85)90262-1</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0022-2836
ispartof Journal of molecular biology, 1985-01, Vol.186 (1), p.117-128
issn 0022-2836
1089-8638
language eng
recordid cdi_proquest_miscellaneous_76532871
source MEDLINE; Elsevier ScienceDirect Journals
subjects algorithms
Analytical, structural and metabolic biochemistry
Base Sequence
Biological and medical sciences
Biotechnology
computer analysis
computer techniques
DNA
DNA conformation
DNA, Bacterial - genetics
Dna, deoxyribonucleoproteins
Escherichia coli
Escherichia coli - genetics
Fundamental and applied biological sciences. Psychology
Genetic engineering
Genetic technics
Methods
Methods. Procedures. Technologies
molecular genetics
Nucleic acids
nucleotide sequences
Pattern Recognition, Automated
promoter regions
Promoter Regions, Genetic
sequence alignment
Sequence Homology, Nucleic Acid
Synthetic digonucleotides and genes. Sequencing
Transcription, Genetic
title Rigorous pattern-recognition methods for DNA sequences: Analysis of promoter sequences from Escherichia coli
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T10%3A07%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Rigorous%20pattern-recognition%20methods%20for%20DNA%20sequences:%20Analysis%20of%20promoter%20sequences%20from%20Escherichia%20coli&rft.jtitle=Journal%20of%20molecular%20biology&rft.au=Galas,%20David%20J.&rft.date=1985-01-01&rft.volume=186&rft.issue=1&rft.spage=117&rft.epage=128&rft.pages=117-128&rft.issn=0022-2836&rft.eissn=1089-8638&rft.coden=JMOBAK&rft_id=info:doi/10.1016/0022-2836(85)90262-1&rft_dat=%3Cproquest_pubme%3E14338478%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=14338478&rft_id=info:pmid/3908689&rft_els_id=0022283685902621&rfr_iscdi=true