Hidden ancient repeats in DNA: Mapping and quantification

We have shown, in a previous paper, that tandem repeating sequences, especially triplet repeats, play a very important role in gene evolution. This result led to the formulation of the following hypothesis: most of the genomic sequences evolved through everlasting acts of tandem repeat expansions wi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Gene 2013-10, Vol.528 (2), p.282-287
Hauptverfasser: Frenkel, Zakharia M., Barzily, Zeev, Volkovich, Zeev, Trifonov, Edward N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 287
container_issue 2
container_start_page 282
container_title Gene
container_volume 528
creator Frenkel, Zakharia M.
Barzily, Zeev
Volkovich, Zeev
Trifonov, Edward N.
description We have shown, in a previous paper, that tandem repeating sequences, especially triplet repeats, play a very important role in gene evolution. This result led to the formulation of the following hypothesis: most of the genomic sequences evolved through everlasting acts of tandem repeat expansions with subsequent accumulation of changes. In order to estimate how much of the observed sequences have the repeat origin we describe the adaptation of a text segmentation algorithm, based on dynamic programming, to the mapping of the ancient expansion events. The algorithm maximizes the segmentation cost, calculated as the similarity of obtained fragments to the putative repeat sequence. In the first application of the algorithm to segmentations of genomic sequences, a significant difference between the natural sequences and the corresponding shuffled sequences is detected. The natural fragments are longer and more similar to the putative repeat sequences. As our analysis shows, the coding sequences allow for repeats only when the size of the repeated words is divisible by three. In contrast, in the non-coding sequences, all repeated word sizes are present. It was estimated, that in Escherichia coli K12 genome, about 35.5% of sequence can be detectably traced to original simple repeat ancestors. The results shed light on the genomic sequence organization, and strongly confirm the hypothesis about the crucial role of triplet expansions in gene origin and evolution. •Text segmentation technique is applied to detect hidden ancient repeats.•Minimal estimate of 35.5% is made for the sequences originated from simple repeats.•Protein coding sequences contain hidden tandem repeats of unit lengths multiple of 3.•Hidden tandem repeats in eucaryotic sequences have a variety of repeat unit lengths.
doi_str_mv 10.1016/j.gene.2013.06.059
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1429215379</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0378111913008275</els_id><sourcerecordid>1429215379</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-462ccad4e7756c8f682997f5c8c78ff37c04389741a73a4784f13e0293cf53f13</originalsourceid><addsrcrecordid>eNp9kD1PwzAQhi0EoqXwBxhQRpYEf8SxjViq8lGkAgvMlnHOlavWSe0EiX9PqhZGbrmT7rlXugehS4ILgkl1syqWEKCgmLACVwXm6giNiRQqx5jJYzTGTMicEKJG6CylFR6Kc3qKRpRJQSlmY6Tmvq4hZCZYD6HLIrRgupT5kN2_Tm-zF9O2PiyHfZ1texM677w1nW_COTpxZp3g4tAn6OPx4X02zxdvT8-z6SK3jFddXlbUWlOXIASvrHSVpEoJx620QjrHhMUlk0qUxAhmSiFLRxhgqph1nA3zBF3vc9vYbHtInd74ZGG9NgGaPmlSUkUJZ0INKN2jNjYpRXC6jX5j4rcmWO-U6ZXeKdM7ZRpXelA2HF0d8vvPDdR_J7-OBuBuD8Dw5ZeHqNPOlYXaR7Cdrhv_X_4PbV56rw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1429215379</pqid></control><display><type>article</type><title>Hidden ancient repeats in DNA: Mapping and quantification</title><source>MEDLINE</source><source>ScienceDirect Journals (5 years ago - present)</source><creator>Frenkel, Zakharia M. ; Barzily, Zeev ; Volkovich, Zeev ; Trifonov, Edward N.</creator><creatorcontrib>Frenkel, Zakharia M. ; Barzily, Zeev ; Volkovich, Zeev ; Trifonov, Edward N.</creatorcontrib><description>We have shown, in a previous paper, that tandem repeating sequences, especially triplet repeats, play a very important role in gene evolution. This result led to the formulation of the following hypothesis: most of the genomic sequences evolved through everlasting acts of tandem repeat expansions with subsequent accumulation of changes. In order to estimate how much of the observed sequences have the repeat origin we describe the adaptation of a text segmentation algorithm, based on dynamic programming, to the mapping of the ancient expansion events. The algorithm maximizes the segmentation cost, calculated as the similarity of obtained fragments to the putative repeat sequence. In the first application of the algorithm to segmentations of genomic sequences, a significant difference between the natural sequences and the corresponding shuffled sequences is detected. The natural fragments are longer and more similar to the putative repeat sequences. As our analysis shows, the coding sequences allow for repeats only when the size of the repeated words is divisible by three. In contrast, in the non-coding sequences, all repeated word sizes are present. It was estimated, that in Escherichia coli K12 genome, about 35.5% of sequence can be detectably traced to original simple repeat ancestors. The results shed light on the genomic sequence organization, and strongly confirm the hypothesis about the crucial role of triplet expansions in gene origin and evolution. •Text segmentation technique is applied to detect hidden ancient repeats.•Minimal estimate of 35.5% is made for the sequences originated from simple repeats.•Protein coding sequences contain hidden tandem repeats of unit lengths multiple of 3.•Hidden tandem repeats in eucaryotic sequences have a variety of repeat unit lengths.</description><identifier>ISSN: 0378-1119</identifier><identifier>EISSN: 1879-0038</identifier><identifier>DOI: 10.1016/j.gene.2013.06.059</identifier><identifier>PMID: 23872203</identifier><language>eng</language><publisher>Netherlands: Elsevier B.V</publisher><subject>Algorithms ; Ancient repeats ; Base Sequence ; Chromosome Mapping ; Escherichia coli K12 - genetics ; Evolution, Molecular ; Genome, Bacterial ; Genome, Fungal ; Models, Genetic ; Saccharomyces cerevisiae - genetics ; Sequence Analysis, DNA ; Text mining ; Trinucleotide Repeats ; Triplet expansion</subject><ispartof>Gene, 2013-10, Vol.528 (2), p.282-287</ispartof><rights>2013 Elsevier B.V.</rights><rights>2013 Elsevier B.V. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c356t-462ccad4e7756c8f682997f5c8c78ff37c04389741a73a4784f13e0293cf53f13</citedby><cites>FETCH-LOGICAL-c356t-462ccad4e7756c8f682997f5c8c78ff37c04389741a73a4784f13e0293cf53f13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.gene.2013.06.059$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3548,27923,27924,45994</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23872203$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Frenkel, Zakharia M.</creatorcontrib><creatorcontrib>Barzily, Zeev</creatorcontrib><creatorcontrib>Volkovich, Zeev</creatorcontrib><creatorcontrib>Trifonov, Edward N.</creatorcontrib><title>Hidden ancient repeats in DNA: Mapping and quantification</title><title>Gene</title><addtitle>Gene</addtitle><description>We have shown, in a previous paper, that tandem repeating sequences, especially triplet repeats, play a very important role in gene evolution. This result led to the formulation of the following hypothesis: most of the genomic sequences evolved through everlasting acts of tandem repeat expansions with subsequent accumulation of changes. In order to estimate how much of the observed sequences have the repeat origin we describe the adaptation of a text segmentation algorithm, based on dynamic programming, to the mapping of the ancient expansion events. The algorithm maximizes the segmentation cost, calculated as the similarity of obtained fragments to the putative repeat sequence. In the first application of the algorithm to segmentations of genomic sequences, a significant difference between the natural sequences and the corresponding shuffled sequences is detected. The natural fragments are longer and more similar to the putative repeat sequences. As our analysis shows, the coding sequences allow for repeats only when the size of the repeated words is divisible by three. In contrast, in the non-coding sequences, all repeated word sizes are present. It was estimated, that in Escherichia coli K12 genome, about 35.5% of sequence can be detectably traced to original simple repeat ancestors. The results shed light on the genomic sequence organization, and strongly confirm the hypothesis about the crucial role of triplet expansions in gene origin and evolution. •Text segmentation technique is applied to detect hidden ancient repeats.•Minimal estimate of 35.5% is made for the sequences originated from simple repeats.•Protein coding sequences contain hidden tandem repeats of unit lengths multiple of 3.•Hidden tandem repeats in eucaryotic sequences have a variety of repeat unit lengths.</description><subject>Algorithms</subject><subject>Ancient repeats</subject><subject>Base Sequence</subject><subject>Chromosome Mapping</subject><subject>Escherichia coli K12 - genetics</subject><subject>Evolution, Molecular</subject><subject>Genome, Bacterial</subject><subject>Genome, Fungal</subject><subject>Models, Genetic</subject><subject>Saccharomyces cerevisiae - genetics</subject><subject>Sequence Analysis, DNA</subject><subject>Text mining</subject><subject>Trinucleotide Repeats</subject><subject>Triplet expansion</subject><issn>0378-1119</issn><issn>1879-0038</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kD1PwzAQhi0EoqXwBxhQRpYEf8SxjViq8lGkAgvMlnHOlavWSe0EiX9PqhZGbrmT7rlXugehS4ILgkl1syqWEKCgmLACVwXm6giNiRQqx5jJYzTGTMicEKJG6CylFR6Kc3qKRpRJQSlmY6Tmvq4hZCZYD6HLIrRgupT5kN2_Tm-zF9O2PiyHfZ1texM677w1nW_COTpxZp3g4tAn6OPx4X02zxdvT8-z6SK3jFddXlbUWlOXIASvrHSVpEoJx620QjrHhMUlk0qUxAhmSiFLRxhgqph1nA3zBF3vc9vYbHtInd74ZGG9NgGaPmlSUkUJZ0INKN2jNjYpRXC6jX5j4rcmWO-U6ZXeKdM7ZRpXelA2HF0d8vvPDdR_J7-OBuBuD8Dw5ZeHqNPOlYXaR7Cdrhv_X_4PbV56rw</recordid><startdate>20131010</startdate><enddate>20131010</enddate><creator>Frenkel, Zakharia M.</creator><creator>Barzily, Zeev</creator><creator>Volkovich, Zeev</creator><creator>Trifonov, Edward N.</creator><general>Elsevier B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20131010</creationdate><title>Hidden ancient repeats in DNA: Mapping and quantification</title><author>Frenkel, Zakharia M. ; Barzily, Zeev ; Volkovich, Zeev ; Trifonov, Edward N.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-462ccad4e7756c8f682997f5c8c78ff37c04389741a73a4784f13e0293cf53f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Ancient repeats</topic><topic>Base Sequence</topic><topic>Chromosome Mapping</topic><topic>Escherichia coli K12 - genetics</topic><topic>Evolution, Molecular</topic><topic>Genome, Bacterial</topic><topic>Genome, Fungal</topic><topic>Models, Genetic</topic><topic>Saccharomyces cerevisiae - genetics</topic><topic>Sequence Analysis, DNA</topic><topic>Text mining</topic><topic>Trinucleotide Repeats</topic><topic>Triplet expansion</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Frenkel, Zakharia M.</creatorcontrib><creatorcontrib>Barzily, Zeev</creatorcontrib><creatorcontrib>Volkovich, Zeev</creatorcontrib><creatorcontrib>Trifonov, Edward N.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Gene</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Frenkel, Zakharia M.</au><au>Barzily, Zeev</au><au>Volkovich, Zeev</au><au>Trifonov, Edward N.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hidden ancient repeats in DNA: Mapping and quantification</atitle><jtitle>Gene</jtitle><addtitle>Gene</addtitle><date>2013-10-10</date><risdate>2013</risdate><volume>528</volume><issue>2</issue><spage>282</spage><epage>287</epage><pages>282-287</pages><issn>0378-1119</issn><eissn>1879-0038</eissn><abstract>We have shown, in a previous paper, that tandem repeating sequences, especially triplet repeats, play a very important role in gene evolution. This result led to the formulation of the following hypothesis: most of the genomic sequences evolved through everlasting acts of tandem repeat expansions with subsequent accumulation of changes. In order to estimate how much of the observed sequences have the repeat origin we describe the adaptation of a text segmentation algorithm, based on dynamic programming, to the mapping of the ancient expansion events. The algorithm maximizes the segmentation cost, calculated as the similarity of obtained fragments to the putative repeat sequence. In the first application of the algorithm to segmentations of genomic sequences, a significant difference between the natural sequences and the corresponding shuffled sequences is detected. The natural fragments are longer and more similar to the putative repeat sequences. As our analysis shows, the coding sequences allow for repeats only when the size of the repeated words is divisible by three. In contrast, in the non-coding sequences, all repeated word sizes are present. It was estimated, that in Escherichia coli K12 genome, about 35.5% of sequence can be detectably traced to original simple repeat ancestors. The results shed light on the genomic sequence organization, and strongly confirm the hypothesis about the crucial role of triplet expansions in gene origin and evolution. •Text segmentation technique is applied to detect hidden ancient repeats.•Minimal estimate of 35.5% is made for the sequences originated from simple repeats.•Protein coding sequences contain hidden tandem repeats of unit lengths multiple of 3.•Hidden tandem repeats in eucaryotic sequences have a variety of repeat unit lengths.</abstract><cop>Netherlands</cop><pub>Elsevier B.V</pub><pmid>23872203</pmid><doi>10.1016/j.gene.2013.06.059</doi><tpages>6</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0378-1119
ispartof Gene, 2013-10, Vol.528 (2), p.282-287
issn 0378-1119
1879-0038
language eng
recordid cdi_proquest_miscellaneous_1429215379
source MEDLINE; ScienceDirect Journals (5 years ago - present)
subjects Algorithms
Ancient repeats
Base Sequence
Chromosome Mapping
Escherichia coli K12 - genetics
Evolution, Molecular
Genome, Bacterial
Genome, Fungal
Models, Genetic
Saccharomyces cerevisiae - genetics
Sequence Analysis, DNA
Text mining
Trinucleotide Repeats
Triplet expansion
title Hidden ancient repeats in DNA: Mapping and quantification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T10%3A52%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hidden%20ancient%20repeats%20in%20DNA:%20Mapping%20and%20quantification&rft.jtitle=Gene&rft.au=Frenkel,%20Zakharia%20M.&rft.date=2013-10-10&rft.volume=528&rft.issue=2&rft.spage=282&rft.epage=287&rft.pages=282-287&rft.issn=0378-1119&rft.eissn=1879-0038&rft_id=info:doi/10.1016/j.gene.2013.06.059&rft_dat=%3Cproquest_cross%3E1429215379%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1429215379&rft_id=info:pmid/23872203&rft_els_id=S0378111913008275&rfr_iscdi=true