Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome

The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate it...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2016-06, Vol.6 (1), p.27722-27722, Article 27722
Hauptverfasser: Fonville, Natalie C., Velmurugan, Karthik Raja, Tae, Hongseok, Vaksman, Zalman, McIver, Lauren J., Garner, Harold R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 27722
container_issue 1
container_start_page 27722
container_title Scientific reports
container_volume 6
creator Fonville, Natalie C.
Velmurugan, Karthik Raja
Tae, Hongseok
Vaksman, Zalman
McIver, Lauren J.
Garner, Harold R.
description The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.
doi_str_mv 10.1038/srep27722
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4899811</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4084347441</sourcerecordid><originalsourceid>FETCH-LOGICAL-c438t-6e5d8597644cec62aa0ab092afe364b7b852840f9132407fea5afc26eafb1dce3</originalsourceid><addsrcrecordid>eNplkVFrFDEUhYMottQ--Ack4IuKo0kmmSQ-CFLaKhR80eeQmbnZTZlJ1iRT2H9vhq3LqnlJuOfLuZd7EHpJyQdKWvUxJ9gxKRl7gs4Z4aJhLWNPT95n6DLne1KPYJpT_RydMcmk6jp9jva3EOLsBzyBK_EBUv6E_QiheLf3YYNDrU24AilmW2CafIH8Hq9kUxsnyJWFEc-x_sjYhhG7JQzFx2AnDBPMVc_YB1y2gLfLbAPerC3hBXrm7JTh8vG-QD9vrn9cfW3uvt9-u_py1wy8VaXpQIxKaNlxPsDQMWuJ7Ylm1kHb8V72SjDFidO0ZZxIB1ZYN7AOrOvpOEB7gT4ffHdLP0OthJLsZHbJzzbtTbTe_K0EvzWb-GC40lpRWg3ePBqk-GuBXMzs81BXYQPEJRsqtVBrALyir_9B7-OS6iYOlGgFkaJSbw_UutQanzsOQ4lZMzXHTCv76nT6I_knwQq8OwC5SmED6aTlf26_Abg6rwc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1795535075</pqid></control><display><type>article</type><title>Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome</title><source>MEDLINE</source><source>Nature Free</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><source>Springer Nature OA Free Journals</source><creator>Fonville, Natalie C. ; Velmurugan, Karthik Raja ; Tae, Hongseok ; Vaksman, Zalman ; McIver, Lauren J. ; Garner, Harold R.</creator><creatorcontrib>Fonville, Natalie C. ; Velmurugan, Karthik Raja ; Tae, Hongseok ; Vaksman, Zalman ; McIver, Lauren J. ; Garner, Harold R.</creatorcontrib><description>The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/srep27722</identifier><identifier>PMID: 27278669</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>631/114/2785/2302 ; 631/208/212/2302 ; Algorithms ; Animals ; Cell Line ; Contig Mapping ; Design ; Genome, Human ; Genomes ; Genomics ; Humanities and Social Sciences ; Humans ; Microsatellite Repeats ; multidisciplinary ; Pan troglodytes - genetics ; Science ; Sequence Analysis, DNA - methods ; Sequence Analysis, RNA - methods</subject><ispartof>Scientific reports, 2016-06, Vol.6 (1), p.27722-27722, Article 27722</ispartof><rights>The Author(s) 2016</rights><rights>Copyright Nature Publishing Group Jun 2016</rights><rights>Copyright © 2016, Macmillan Publishers Limited 2016 Macmillan Publishers Limited</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c438t-6e5d8597644cec62aa0ab092afe364b7b852840f9132407fea5afc26eafb1dce3</citedby><cites>FETCH-LOGICAL-c438t-6e5d8597644cec62aa0ab092afe364b7b852840f9132407fea5afc26eafb1dce3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4899811/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4899811/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,41096,42165,51551,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27278669$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Fonville, Natalie C.</creatorcontrib><creatorcontrib>Velmurugan, Karthik Raja</creatorcontrib><creatorcontrib>Tae, Hongseok</creatorcontrib><creatorcontrib>Vaksman, Zalman</creatorcontrib><creatorcontrib>McIver, Lauren J.</creatorcontrib><creatorcontrib>Garner, Harold R.</creatorcontrib><title>Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.</description><subject>631/114/2785/2302</subject><subject>631/208/212/2302</subject><subject>Algorithms</subject><subject>Animals</subject><subject>Cell Line</subject><subject>Contig Mapping</subject><subject>Design</subject><subject>Genome, Human</subject><subject>Genomes</subject><subject>Genomics</subject><subject>Humanities and Social Sciences</subject><subject>Humans</subject><subject>Microsatellite Repeats</subject><subject>multidisciplinary</subject><subject>Pan troglodytes - genetics</subject><subject>Science</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Analysis, RNA - methods</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><recordid>eNplkVFrFDEUhYMottQ--Ack4IuKo0kmmSQ-CFLaKhR80eeQmbnZTZlJ1iRT2H9vhq3LqnlJuOfLuZd7EHpJyQdKWvUxJ9gxKRl7gs4Z4aJhLWNPT95n6DLne1KPYJpT_RydMcmk6jp9jva3EOLsBzyBK_EBUv6E_QiheLf3YYNDrU24AilmW2CafIH8Hq9kUxsnyJWFEc-x_sjYhhG7JQzFx2AnDBPMVc_YB1y2gLfLbAPerC3hBXrm7JTh8vG-QD9vrn9cfW3uvt9-u_py1wy8VaXpQIxKaNlxPsDQMWuJ7Ylm1kHb8V72SjDFidO0ZZxIB1ZYN7AOrOvpOEB7gT4ffHdLP0OthJLsZHbJzzbtTbTe_K0EvzWb-GC40lpRWg3ePBqk-GuBXMzs81BXYQPEJRsqtVBrALyir_9B7-OS6iYOlGgFkaJSbw_UutQanzsOQ4lZMzXHTCv76nT6I_knwQq8OwC5SmED6aTlf26_Abg6rwc</recordid><startdate>20160609</startdate><enddate>20160609</enddate><creator>Fonville, Natalie C.</creator><creator>Velmurugan, Karthik Raja</creator><creator>Tae, Hongseok</creator><creator>Vaksman, Zalman</creator><creator>McIver, Lauren J.</creator><creator>Garner, Harold R.</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20160609</creationdate><title>Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome</title><author>Fonville, Natalie C. ; Velmurugan, Karthik Raja ; Tae, Hongseok ; Vaksman, Zalman ; McIver, Lauren J. ; Garner, Harold R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c438t-6e5d8597644cec62aa0ab092afe364b7b852840f9132407fea5afc26eafb1dce3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>631/114/2785/2302</topic><topic>631/208/212/2302</topic><topic>Algorithms</topic><topic>Animals</topic><topic>Cell Line</topic><topic>Contig Mapping</topic><topic>Design</topic><topic>Genome, Human</topic><topic>Genomes</topic><topic>Genomics</topic><topic>Humanities and Social Sciences</topic><topic>Humans</topic><topic>Microsatellite Repeats</topic><topic>multidisciplinary</topic><topic>Pan troglodytes - genetics</topic><topic>Science</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Analysis, RNA - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fonville, Natalie C.</creatorcontrib><creatorcontrib>Velmurugan, Karthik Raja</creatorcontrib><creatorcontrib>Tae, Hongseok</creatorcontrib><creatorcontrib>Vaksman, Zalman</creatorcontrib><creatorcontrib>McIver, Lauren J.</creatorcontrib><creatorcontrib>Garner, Harold R.</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fonville, Natalie C.</au><au>Velmurugan, Karthik Raja</au><au>Tae, Hongseok</au><au>Vaksman, Zalman</au><au>McIver, Lauren J.</au><au>Garner, Harold R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2016-06-09</date><risdate>2016</risdate><volume>6</volume><issue>1</issue><spage>27722</spage><epage>27722</epage><pages>27722-27722</pages><artnum>27722</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>27278669</pmid><doi>10.1038/srep27722</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2045-2322
ispartof Scientific reports, 2016-06, Vol.6 (1), p.27722-27722, Article 27722
issn 2045-2322
2045-2322
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4899811
source MEDLINE; Nature Free; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry; Springer Nature OA Free Journals
subjects 631/114/2785/2302
631/208/212/2302
Algorithms
Animals
Cell Line
Contig Mapping
Design
Genome, Human
Genomes
Genomics
Humanities and Social Sciences
Humans
Microsatellite Repeats
multidisciplinary
Pan troglodytes - genetics
Science
Sequence Analysis, DNA - methods
Sequence Analysis, RNA - methods
title Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T00%3A18%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Genomic%20leftovers:%20identifying%20novel%20microsatellites,%20over-represented%20motifs%20and%20functional%20elements%20in%20the%20human%20genome&rft.jtitle=Scientific%20reports&rft.au=Fonville,%20Natalie%20C.&rft.date=2016-06-09&rft.volume=6&rft.issue=1&rft.spage=27722&rft.epage=27722&rft.pages=27722-27722&rft.artnum=27722&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/srep27722&rft_dat=%3Cproquest_pubme%3E4084347441%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1795535075&rft_id=info:pmid/27278669&rfr_iscdi=true