Matching among multiple random sequences

In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses setti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bulletin of mathematical biology 1997-05, Vol.59 (3), p.483-496
Hauptverfasser:	Naus, J.I, Sheng, K
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms amino acid sequences Approximation Bioinformatics Deoxyribonucleic acid DNA DNA - chemistry Homology Matching Mathematical analysis mathematical models Mathematics Models, Statistical Nucleic acids nucleotide sequences Probability probability analysis Proteins Proteins - chemistry Random Allocation Reproducibility of Results RNA - chemistry Segments sequence homology Sequence Homology, Amino Acid Sequence Homology, Nucleic Acid Statistical analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	496
container_issue	3
container_start_page	483
container_title	Bulletin of mathematical biology
container_volume	59
creator	Naus, J.I Sheng, K
description	In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses settings of "unusualness of multiple matches" to calibrate the algorithms. In applications where a researcher has found a multiple matching word, statistical significance helps gauge the unusualness of the observed match. Previous approximations for the unusualness of multiple matches are based on large sample theory, and are sometimes quite inaccurate. Section 2 illustrates this inaccuracy, and provides accurate approximations for the probability of a common word in R out of R sequences. Section 3 generalizes the approximation to multiple matching in R out of S sequences. Section 4 describes a more complex approximation that incorporates exact probabilities and yields excellent accuracy; this approximation is useful for checking the simpler approximations over a range of values.
doi_str_mv	10.1007/BF02459461
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_79035884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2094930051</sourcerecordid><originalsourceid>FETCH-LOGICAL-c276t-9771472000b14fc845cf15ca3200472362fd12a2e1e30327f4fbc284210f37ca3</originalsourceid><addsrcrecordid>eNp10U1LAzEQBuAgSq3Vi3exKIgIq8kkm4-jFqtCxYP2HNI0qVv2oya7B_-9kS0KgpcEZh5mwhuEjgm-JhiLm7spBpYrxskOGpIcIFMcwy4aYqwgk8DwPjqIcY0TVlQN0EARARLyIbp8Nq19L-rV2FRNOquubItN6cbB1MumGkf30bnauniI9rwpozva3iM0n96_TR6z2cvD0-R2llkQvM2UEIQJSJsWhHkrWW49ya2hqZTqlINfEjDgiKOYgvDMLyxIBgR7KpIboYt-7iY0aXVsdVVE68rS1K7pohYK01xKluDZH7huulCnt2lBGVBCGE_o_D8EnDMmGJd5Ule9sqGJMTivN6GoTPjUBOvvhPVvwgmfbEd2i8otf-g20tQ_7fveNNqsQhH1_BUwoRgkF-l36BeJWHpj</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>734231146</pqid></control><display><type>article</type><title>Matching among multiple random sequences</title><source>MEDLINE</source><source>Springer Nature - Complete Springer Journals</source><source>Alma/SFX Local Collection</source><creator>Naus, J.I ; Sheng, K</creator><creatorcontrib>Naus, J.I ; Sheng, K</creatorcontrib><description>In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses settings of "unusualness of multiple matches" to calibrate the algorithms. In applications where a researcher has found a multiple matching word, statistical significance helps gauge the unusualness of the observed match. Previous approximations for the unusualness of multiple matches are based on large sample theory, and are sometimes quite inaccurate. Section 2 illustrates this inaccuracy, and provides accurate approximations for the probability of a common word in R out of R sequences. Section 3 generalizes the approximation to multiple matching in R out of S sequences. Section 4 describes a more complex approximation that incorporates exact probabilities and yields excellent accuracy; this approximation is useful for checking the simpler approximations over a range of values.</description><identifier>ISSN: 0092-8240</identifier><identifier>EISSN: 1522-9602</identifier><identifier>DOI: 10.1007/BF02459461</identifier><identifier>PMID: 9172825</identifier><language>eng</language><publisher>United States: Springer Nature B.V</publisher><subject>Algorithms ; amino acid sequences ; Approximation ; Bioinformatics ; Deoxyribonucleic acid ; DNA ; DNA - chemistry ; Homology ; Matching ; Mathematical analysis ; mathematical models ; Mathematics ; Models, Statistical ; Nucleic acids ; nucleotide sequences ; Probability ; probability analysis ; Proteins ; Proteins - chemistry ; Random Allocation ; Reproducibility of Results ; RNA - chemistry ; Segments ; sequence homology ; Sequence Homology, Amino Acid ; Sequence Homology, Nucleic Acid ; Statistical analysis</subject><ispartof>Bulletin of mathematical biology, 1997-05, Vol.59 (3), p.483-496</ispartof><rights>Society for Mathematical Biology 1997.</rights><rights>Society for Mathematical Biology 1997</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c276t-9771472000b14fc845cf15ca3200472362fd12a2e1e30327f4fbc284210f37ca3</citedby><cites>FETCH-LOGICAL-c276t-9771472000b14fc845cf15ca3200472362fd12a2e1e30327f4fbc284210f37ca3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/9172825$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Naus, J.I</creatorcontrib><creatorcontrib>Sheng, K</creatorcontrib><title>Matching among multiple random sequences</title><title>Bulletin of mathematical biology</title><addtitle>Bull Math Biol</addtitle><description>In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses settings of "unusualness of multiple matches" to calibrate the algorithms. In applications where a researcher has found a multiple matching word, statistical significance helps gauge the unusualness of the observed match. Previous approximations for the unusualness of multiple matches are based on large sample theory, and are sometimes quite inaccurate. Section 2 illustrates this inaccuracy, and provides accurate approximations for the probability of a common word in R out of R sequences. Section 3 generalizes the approximation to multiple matching in R out of S sequences. Section 4 describes a more complex approximation that incorporates exact probabilities and yields excellent accuracy; this approximation is useful for checking the simpler approximations over a range of values.</description><subject>Algorithms</subject><subject>amino acid sequences</subject><subject>Approximation</subject><subject>Bioinformatics</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA - chemistry</subject><subject>Homology</subject><subject>Matching</subject><subject>Mathematical analysis</subject><subject>mathematical models</subject><subject>Mathematics</subject><subject>Models, Statistical</subject><subject>Nucleic acids</subject><subject>nucleotide sequences</subject><subject>Probability</subject><subject>probability analysis</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Random Allocation</subject><subject>Reproducibility of Results</subject><subject>RNA - chemistry</subject><subject>Segments</subject><subject>sequence homology</subject><subject>Sequence Homology, Amino Acid</subject><subject>Sequence Homology, Nucleic Acid</subject><subject>Statistical analysis</subject><issn>0092-8240</issn><issn>1522-9602</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1997</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><recordid>eNp10U1LAzEQBuAgSq3Vi3exKIgIq8kkm4-jFqtCxYP2HNI0qVv2oya7B_-9kS0KgpcEZh5mwhuEjgm-JhiLm7spBpYrxskOGpIcIFMcwy4aYqwgk8DwPjqIcY0TVlQN0EARARLyIbp8Nq19L-rV2FRNOquubItN6cbB1MumGkf30bnauniI9rwpozva3iM0n96_TR6z2cvD0-R2llkQvM2UEIQJSJsWhHkrWW49ya2hqZTqlINfEjDgiKOYgvDMLyxIBgR7KpIboYt-7iY0aXVsdVVE68rS1K7pohYK01xKluDZH7huulCnt2lBGVBCGE_o_D8EnDMmGJd5Ule9sqGJMTivN6GoTPjUBOvvhPVvwgmfbEd2i8otf-g20tQ_7fveNNqsQhH1_BUwoRgkF-l36BeJWHpj</recordid><startdate>199705</startdate><enddate>199705</enddate><creator>Naus, J.I</creator><creator>Sheng, K</creator><general>Springer Nature B.V</general><scope>FBQ</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SS</scope><scope>7TK</scope><scope>JQ2</scope><scope>K9.</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>8AO</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K7-</scope><scope>L6V</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>7X8</scope></search><sort><creationdate>199705</creationdate><title>Matching among multiple random sequences</title><author>Naus, J.I ; Sheng, K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c276t-9771472000b14fc845cf15ca3200472362fd12a2e1e30327f4fbc284210f37ca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1997</creationdate><topic>Algorithms</topic><topic>amino acid sequences</topic><topic>Approximation</topic><topic>Bioinformatics</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA - chemistry</topic><topic>Homology</topic><topic>Matching</topic><topic>Mathematical analysis</topic><topic>mathematical models</topic><topic>Mathematics</topic><topic>Models, Statistical</topic><topic>Nucleic acids</topic><topic>nucleotide sequences</topic><topic>Probability</topic><topic>probability analysis</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Random Allocation</topic><topic>Reproducibility of Results</topic><topic>RNA - chemistry</topic><topic>Segments</topic><topic>sequence homology</topic><topic>Sequence Homology, Amino Acid</topic><topic>Sequence Homology, Nucleic Acid</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Naus, J.I</creatorcontrib><creatorcontrib>Sheng, K</creatorcontrib><collection>AGRIS</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Neurosciences Abstracts</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Central (Corporate)</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>MEDLINE - Academic</collection><jtitle>Bulletin of mathematical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Naus, J.I</au><au>Sheng, K</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Matching among multiple random sequences</atitle><jtitle>Bulletin of mathematical biology</jtitle><addtitle>Bull Math Biol</addtitle><date>1997-05</date><risdate>1997</risdate><volume>59</volume><issue>3</issue><spage>483</spage><epage>496</epage><pages>483-496</pages><issn>0092-8240</issn><eissn>1522-9602</eissn><abstract>In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses settings of "unusualness of multiple matches" to calibrate the algorithms. In applications where a researcher has found a multiple matching word, statistical significance helps gauge the unusualness of the observed match. Previous approximations for the unusualness of multiple matches are based on large sample theory, and are sometimes quite inaccurate. Section 2 illustrates this inaccuracy, and provides accurate approximations for the probability of a common word in R out of R sequences. Section 3 generalizes the approximation to multiple matching in R out of S sequences. Section 4 describes a more complex approximation that incorporates exact probabilities and yields excellent accuracy; this approximation is useful for checking the simpler approximations over a range of values.</abstract><cop>United States</cop><pub>Springer Nature B.V</pub><pmid>9172825</pmid><doi>10.1007/BF02459461</doi><tpages>14</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0092-8240
ispartof	Bulletin of mathematical biology, 1997-05, Vol.59 (3), p.483-496
issn	0092-8240 1522-9602
language	eng
recordid	cdi_proquest_miscellaneous_79035884
source	MEDLINE; Springer Nature - Complete Springer Journals; Alma/SFX Local Collection
subjects	Algorithms amino acid sequences Approximation Bioinformatics Deoxyribonucleic acid DNA DNA - chemistry Homology Matching Mathematical analysis mathematical models Mathematics Models, Statistical Nucleic acids nucleotide sequences Probability probability analysis Proteins Proteins - chemistry Random Allocation Reproducibility of Results RNA - chemistry Segments sequence homology Sequence Homology, Amino Acid Sequence Homology, Nucleic Acid Statistical analysis
title	Matching among multiple random sequences
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T11%3A55%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Matching%20among%20multiple%20random%20sequences&rft.jtitle=Bulletin%20of%20mathematical%20biology&rft.au=Naus,%20J.I&rft.date=1997-05&rft.volume=59&rft.issue=3&rft.spage=483&rft.epage=496&rft.pages=483-496&rft.issn=0092-8240&rft.eissn=1522-9602&rft_id=info:doi/10.1007/BF02459461&rft_dat=%3Cproquest_cross%3E2094930051%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=734231146&rft_id=info:pmid/9172825&rfr_iscdi=true