Effects of long-range correlations in DNA on sequence alignment score statistics
Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score l...
Gespeichert in:
Veröffentlicht in: | Journal of computational biology 2007-06, Vol.14 (5), p.655-668 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 668 |
---|---|
container_issue | 5 |
container_start_page | 655 |
container_title | Journal of computational biology |
container_volume | 14 |
creator | Messer, Philipp W Bundschuh, Ralf Vingron, Martin Arndt, Peter F |
description | Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context. |
doi_str_mv | 10.1089/cmb.2007.R008 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68146836</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>68146836</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</originalsourceid><addsrcrecordid>eNqF0M9LwzAUwPEgipvTo1fJyVtn0jQv6XHM-QOGiug5ZGkyKm0yk-7gf2_KBh495RE-PB5fhK4pmVMi6zvTb-YlIWL-Tog8QVPKuSgkAJzmmQAUvBRigi5S-iKEMiDiHE2oAMlKgCl6WzlnzZBwcLgLfltE7bcWmxCj7fTQBp9w6_H9ywIHj5P93ltvLNZdu_W99QNOmVqchmzT0Jp0ic6c7pK9Or4z9Pmw-lg-FevXx-flYl0YBnQoDEgugVBbE9M0TFCua8prEE3Fals1lFUVcNZUkjKijWNOS77R2uVvWcqSzdDtYe8uhnxUGlTfJmO7Tnsb9kmBpFky-BfSWrC6pCzD4gBNDClF69Qutr2OP4oSNbZWubUaW6uxdfY3x8X7TW-bP32My34BS7l5YQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>19739213</pqid></control><display><type>article</type><title>Effects of long-range correlations in DNA on sequence alignment score statistics</title><source>Mary Ann Liebert Online Subscription</source><source>MEDLINE</source><creator>Messer, Philipp W ; Bundschuh, Ralf ; Vingron, Martin ; Arndt, Peter F</creator><creatorcontrib>Messer, Philipp W ; Bundschuh, Ralf ; Vingron, Martin ; Arndt, Peter F</creatorcontrib><description>Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.</description><identifier>ISSN: 1066-5277</identifier><identifier>EISSN: 1557-8666</identifier><identifier>DOI: 10.1089/cmb.2007.R008</identifier><identifier>PMID: 17683266</identifier><language>eng</language><publisher>United States</publisher><subject>Animals ; Computer Simulation ; Humans ; Models, Genetic ; Models, Statistical ; Sequence Alignment - methods ; Sequence Alignment - statistics & numerical data ; Sequence Alignment - trends ; Sequence Analysis, DNA - methods ; Sequence Analysis, DNA - statistics & numerical data ; Sequence Analysis, DNA - trends ; Sequence Homology, Nucleic Acid</subject><ispartof>Journal of computational biology, 2007-06, Vol.14 (5), p.655-668</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</citedby><cites>FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,3029,27905,27906</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/17683266$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Messer, Philipp W</creatorcontrib><creatorcontrib>Bundschuh, Ralf</creatorcontrib><creatorcontrib>Vingron, Martin</creatorcontrib><creatorcontrib>Arndt, Peter F</creatorcontrib><title>Effects of long-range correlations in DNA on sequence alignment score statistics</title><title>Journal of computational biology</title><addtitle>J Comput Biol</addtitle><description>Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.</description><subject>Animals</subject><subject>Computer Simulation</subject><subject>Humans</subject><subject>Models, Genetic</subject><subject>Models, Statistical</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Alignment - statistics & numerical data</subject><subject>Sequence Alignment - trends</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Analysis, DNA - statistics & numerical data</subject><subject>Sequence Analysis, DNA - trends</subject><subject>Sequence Homology, Nucleic Acid</subject><issn>1066-5277</issn><issn>1557-8666</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqF0M9LwzAUwPEgipvTo1fJyVtn0jQv6XHM-QOGiug5ZGkyKm0yk-7gf2_KBh495RE-PB5fhK4pmVMi6zvTb-YlIWL-Tog8QVPKuSgkAJzmmQAUvBRigi5S-iKEMiDiHE2oAMlKgCl6WzlnzZBwcLgLfltE7bcWmxCj7fTQBp9w6_H9ywIHj5P93ltvLNZdu_W99QNOmVqchmzT0Jp0ic6c7pK9Or4z9Pmw-lg-FevXx-flYl0YBnQoDEgugVBbE9M0TFCua8prEE3Fals1lFUVcNZUkjKijWNOS77R2uVvWcqSzdDtYe8uhnxUGlTfJmO7Tnsb9kmBpFky-BfSWrC6pCzD4gBNDClF69Qutr2OP4oSNbZWubUaW6uxdfY3x8X7TW-bP32My34BS7l5YQ</recordid><startdate>200706</startdate><enddate>200706</enddate><creator>Messer, Philipp W</creator><creator>Bundschuh, Ralf</creator><creator>Vingron, Martin</creator><creator>Arndt, Peter F</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>200706</creationdate><title>Effects of long-range correlations in DNA on sequence alignment score statistics</title><author>Messer, Philipp W ; Bundschuh, Ralf ; Vingron, Martin ; Arndt, Peter F</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-c6858601e90cdd3715a915967d439e4d1344653d48130acf3fa85baaf44682823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Animals</topic><topic>Computer Simulation</topic><topic>Humans</topic><topic>Models, Genetic</topic><topic>Models, Statistical</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Alignment - statistics & numerical data</topic><topic>Sequence Alignment - trends</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Analysis, DNA - statistics & numerical data</topic><topic>Sequence Analysis, DNA - trends</topic><topic>Sequence Homology, Nucleic Acid</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Messer, Philipp W</creatorcontrib><creatorcontrib>Bundschuh, Ralf</creatorcontrib><creatorcontrib>Vingron, Martin</creatorcontrib><creatorcontrib>Arndt, Peter F</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Messer, Philipp W</au><au>Bundschuh, Ralf</au><au>Vingron, Martin</au><au>Arndt, Peter F</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effects of long-range correlations in DNA on sequence alignment score statistics</atitle><jtitle>Journal of computational biology</jtitle><addtitle>J Comput Biol</addtitle><date>2007-06</date><risdate>2007</risdate><volume>14</volume><issue>5</issue><spage>655</spage><epage>668</epage><pages>655-668</pages><issn>1066-5277</issn><eissn>1557-8666</eissn><abstract>Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.</abstract><cop>United States</cop><pmid>17683266</pmid><doi>10.1089/cmb.2007.R008</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1066-5277 |
ispartof | Journal of computational biology, 2007-06, Vol.14 (5), p.655-668 |
issn | 1066-5277 1557-8666 |
language | eng |
recordid | cdi_proquest_miscellaneous_68146836 |
source | Mary Ann Liebert Online Subscription; MEDLINE |
subjects | Animals Computer Simulation Humans Models, Genetic Models, Statistical Sequence Alignment - methods Sequence Alignment - statistics & numerical data Sequence Alignment - trends Sequence Analysis, DNA - methods Sequence Analysis, DNA - statistics & numerical data Sequence Analysis, DNA - trends Sequence Homology, Nucleic Acid |
title | Effects of long-range correlations in DNA on sequence alignment score statistics |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T00%3A21%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effects%20of%20long-range%20correlations%20in%20DNA%20on%20sequence%20alignment%20score%20statistics&rft.jtitle=Journal%20of%20computational%20biology&rft.au=Messer,%20Philipp%20W&rft.date=2007-06&rft.volume=14&rft.issue=5&rft.spage=655&rft.epage=668&rft.pages=655-668&rft.issn=1066-5277&rft.eissn=1557-8666&rft_id=info:doi/10.1089/cmb.2007.R008&rft_dat=%3Cproquest_cross%3E68146836%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=19739213&rft_id=info:pmid/17683266&rfr_iscdi=true |