Effects of long-range correlations in DNA on sequence alignment score statistics

Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computational biology 2007-06, Vol.14 (5), p.655-668
Hauptverfasser: Messer, Philipp W, Bundschuh, Ralf, Vingron, Martin, Arndt, Peter F
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 668
container_issue 5
container_start_page 655
container_title Journal of computational biology
container_volume 14
creator Messer, Philipp W
Bundschuh, Ralf
Vingron, Martin
Arndt, Peter F
description Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.
doi_str_mv 10.1089/cmb.2007.R008
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68146836</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>68146836</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</originalsourceid><addsrcrecordid>eNqF0M9LwzAUwPEgipvTo1fJyVtn0jQv6XHM-QOGiug5ZGkyKm0yk-7gf2_KBh495RE-PB5fhK4pmVMi6zvTb-YlIWL-Tog8QVPKuSgkAJzmmQAUvBRigi5S-iKEMiDiHE2oAMlKgCl6WzlnzZBwcLgLfltE7bcWmxCj7fTQBp9w6_H9ywIHj5P93ltvLNZdu_W99QNOmVqchmzT0Jp0ic6c7pK9Or4z9Pmw-lg-FevXx-flYl0YBnQoDEgugVBbE9M0TFCua8prEE3Fals1lFUVcNZUkjKijWNOS77R2uVvWcqSzdDtYe8uhnxUGlTfJmO7Tnsb9kmBpFky-BfSWrC6pCzD4gBNDClF69Qutr2OP4oSNbZWubUaW6uxdfY3x8X7TW-bP32My34BS7l5YQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>19739213</pqid></control><display><type>article</type><title>Effects of long-range correlations in DNA on sequence alignment score statistics</title><source>Mary Ann Liebert Online Subscription</source><source>MEDLINE</source><creator>Messer, Philipp W ; Bundschuh, Ralf ; Vingron, Martin ; Arndt, Peter F</creator><creatorcontrib>Messer, Philipp W ; Bundschuh, Ralf ; Vingron, Martin ; Arndt, Peter F</creatorcontrib><description>Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.</description><identifier>ISSN: 1066-5277</identifier><identifier>EISSN: 1557-8666</identifier><identifier>DOI: 10.1089/cmb.2007.R008</identifier><identifier>PMID: 17683266</identifier><language>eng</language><publisher>United States</publisher><subject>Animals ; Computer Simulation ; Humans ; Models, Genetic ; Models, Statistical ; Sequence Alignment - methods ; Sequence Alignment - statistics &amp; numerical data ; Sequence Alignment - trends ; Sequence Analysis, DNA - methods ; Sequence Analysis, DNA - statistics &amp; numerical data ; Sequence Analysis, DNA - trends ; Sequence Homology, Nucleic Acid</subject><ispartof>Journal of computational biology, 2007-06, Vol.14 (5), p.655-668</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</citedby><cites>FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,3029,27905,27906</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/17683266$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Messer, Philipp W</creatorcontrib><creatorcontrib>Bundschuh, Ralf</creatorcontrib><creatorcontrib>Vingron, Martin</creatorcontrib><creatorcontrib>Arndt, Peter F</creatorcontrib><title>Effects of long-range correlations in DNA on sequence alignment score statistics</title><title>Journal of computational biology</title><addtitle>J Comput Biol</addtitle><description>Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.</description><subject>Animals</subject><subject>Computer Simulation</subject><subject>Humans</subject><subject>Models, Genetic</subject><subject>Models, Statistical</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Alignment - statistics &amp; numerical data</subject><subject>Sequence Alignment - trends</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Analysis, DNA - statistics &amp; numerical data</subject><subject>Sequence Analysis, DNA - trends</subject><subject>Sequence Homology, Nucleic Acid</subject><issn>1066-5277</issn><issn>1557-8666</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqF0M9LwzAUwPEgipvTo1fJyVtn0jQv6XHM-QOGiug5ZGkyKm0yk-7gf2_KBh495RE-PB5fhK4pmVMi6zvTb-YlIWL-Tog8QVPKuSgkAJzmmQAUvBRigi5S-iKEMiDiHE2oAMlKgCl6WzlnzZBwcLgLfltE7bcWmxCj7fTQBp9w6_H9ywIHj5P93ltvLNZdu_W99QNOmVqchmzT0Jp0ic6c7pK9Or4z9Pmw-lg-FevXx-flYl0YBnQoDEgugVBbE9M0TFCua8prEE3Fals1lFUVcNZUkjKijWNOS77R2uVvWcqSzdDtYe8uhnxUGlTfJmO7Tnsb9kmBpFky-BfSWrC6pCzD4gBNDClF69Qutr2OP4oSNbZWubUaW6uxdfY3x8X7TW-bP32My34BS7l5YQ</recordid><startdate>200706</startdate><enddate>200706</enddate><creator>Messer, Philipp W</creator><creator>Bundschuh, Ralf</creator><creator>Vingron, Martin</creator><creator>Arndt, Peter F</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>200706</creationdate><title>Effects of long-range correlations in DNA on sequence alignment score statistics</title><author>Messer, Philipp W ; Bundschuh, Ralf ; Vingron, Martin ; Arndt, Peter F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Animals</topic><topic>Computer Simulation</topic><topic>Humans</topic><topic>Models, Genetic</topic><topic>Models, Statistical</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Alignment - statistics &amp; numerical data</topic><topic>Sequence Alignment - trends</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Analysis, DNA - statistics &amp; numerical data</topic><topic>Sequence Analysis, DNA - trends</topic><topic>Sequence Homology, Nucleic Acid</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Messer, Philipp W</creatorcontrib><creatorcontrib>Bundschuh, Ralf</creatorcontrib><creatorcontrib>Vingron, Martin</creatorcontrib><creatorcontrib>Arndt, Peter F</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Messer, Philipp W</au><au>Bundschuh, Ralf</au><au>Vingron, Martin</au><au>Arndt, Peter F</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effects of long-range correlations in DNA on sequence alignment score statistics</atitle><jtitle>Journal of computational biology</jtitle><addtitle>J Comput Biol</addtitle><date>2007-06</date><risdate>2007</risdate><volume>14</volume><issue>5</issue><spage>655</spage><epage>668</epage><pages>655-668</pages><issn>1066-5277</issn><eissn>1557-8666</eissn><abstract>Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.</abstract><cop>United States</cop><pmid>17683266</pmid><doi>10.1089/cmb.2007.R008</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1066-5277
ispartof Journal of computational biology, 2007-06, Vol.14 (5), p.655-668
issn 1066-5277
1557-8666
language eng
recordid cdi_proquest_miscellaneous_68146836
source Mary Ann Liebert Online Subscription; MEDLINE
subjects Animals
Computer Simulation
Humans
Models, Genetic
Models, Statistical
Sequence Alignment - methods
Sequence Alignment - statistics & numerical data
Sequence Alignment - trends
Sequence Analysis, DNA - methods
Sequence Analysis, DNA - statistics & numerical data
Sequence Analysis, DNA - trends
Sequence Homology, Nucleic Acid
title Effects of long-range correlations in DNA on sequence alignment score statistics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T00%3A21%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effects%20of%20long-range%20correlations%20in%20DNA%20on%20sequence%20alignment%20score%20statistics&rft.jtitle=Journal%20of%20computational%20biology&rft.au=Messer,%20Philipp%20W&rft.date=2007-06&rft.volume=14&rft.issue=5&rft.spage=655&rft.epage=668&rft.pages=655-668&rft.issn=1066-5277&rft.eissn=1557-8666&rft_id=info:doi/10.1089/cmb.2007.R008&rft_dat=%3Cproquest_cross%3E68146836%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=19739213&rft_id=info:pmid/17683266&rfr_iscdi=true