Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes

Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the gen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:American journal of human genetics 2017-11, Vol.101 (5), p.700-715
Hauptverfasser: Tang, Haibao, Kirkness, Ewen F., Lippert, Christoph, Biggs, William H., Fabani, Martin, Guzman, Ernesto, Ramakrishnan, Smriti, Lavrenko, Victor, Kakaradov, Boyko, Hou, Claire, Hicks, Barry, Heckerman, David, Och, Franz J., Caskey, C. Thomas, Venter, J. Craig, Telenti, Amalio
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 715
container_issue 5
container_start_page 700
container_title American journal of human genetics
container_volume 101
creator Tang, Haibao
Kirkness, Ewen F.
Lippert, Christoph
Biggs, William H.
Fabani, Martin
Guzman, Ernesto
Ramakrishnan, Smriti
Lavrenko, Victor
Kakaradov, Boyko
Hou, Claire
Hicks, Barry
Heckerman, David
Och, Franz J.
Caskey, C. Thomas
Venter, J. Craig
Telenti, Amalio
description Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.
doi_str_mv 10.1016/j.ajhg.2017.09.013
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5673627</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0002929717303828</els_id><sourcerecordid>1964268452</sourcerecordid><originalsourceid>FETCH-LOGICAL-c521t-be78c29a0e7fc8ba75fe5c119ea49401d32fe279b4e5bff3a11afc5d186fe35a3</originalsourceid><addsrcrecordid>eNp9kUFv1DAQhS0EotvCH-CAfORAgseO7VhCSFUpLVIFCIo4Wo4z3vUqiRc7W4l_T1ZbKrhwmsP73pvRPEJeAKuBgXqzrd12s645A10zUzMQj8gKpNCVUkw-JivGGK8MN_qEnJayZQygZeIpOeEGFq1tVuTTl5xCHOK0pinQb5uU5-rWTT2O1VfcoZvp-1jQFaTnw4ADFhonCvy1Epxe70c30R-bNCC9wimNWJ6RJ8ENBZ_fzzPy_cPl7cV1dfP56uPF-U3lJYe56lC3nhvHUAffdk7LgNIDGHSNaRj0ggfk2nQNyi4E4QBc8LKHVgUU0okz8u6Yu9t3I_Yepzm7we5yHF3-ZZOL9l9lihu7TndWKi0U10vAq_uAnH7uscx2jMXjMLgJ075YMKrhqm0kX1B-RH1OpWQMD2uA2UMRdmsPRdhDEZYZuxSxmF7-feCD5c_nF-DtEcDlTXcRsy0-4uSxjxn9bPsU_5f_G690mfM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1964268452</pqid></control><display><type>article</type><title>Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes</title><source>MEDLINE</source><source>Cell Press Free Archives</source><source>Elsevier ScienceDirect Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Tang, Haibao ; Kirkness, Ewen F. ; Lippert, Christoph ; Biggs, William H. ; Fabani, Martin ; Guzman, Ernesto ; Ramakrishnan, Smriti ; Lavrenko, Victor ; Kakaradov, Boyko ; Hou, Claire ; Hicks, Barry ; Heckerman, David ; Och, Franz J. ; Caskey, C. Thomas ; Venter, J. Craig ; Telenti, Amalio</creator><creatorcontrib>Tang, Haibao ; Kirkness, Ewen F. ; Lippert, Christoph ; Biggs, William H. ; Fabani, Martin ; Guzman, Ernesto ; Ramakrishnan, Smriti ; Lavrenko, Victor ; Kakaradov, Boyko ; Hou, Claire ; Hicks, Barry ; Heckerman, David ; Och, Franz J. ; Caskey, C. Thomas ; Venter, J. Craig ; Telenti, Amalio</creatorcontrib><description>Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.</description><identifier>ISSN: 0002-9297</identifier><identifier>EISSN: 1537-6605</identifier><identifier>DOI: 10.1016/j.ajhg.2017.09.013</identifier><identifier>PMID: 29100084</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Adolescent ; Adult ; Alleles ; Child ; Female ; genetic disorder ; Genetics, Population - methods ; genome sequencing ; Genome, Human - genetics ; genotyping ; High-Throughput Nucleotide Sequencing - methods ; Humans ; Male ; Microsatellite Repeats - genetics ; microsatellites ; Middle Aged ; Polymorphism, Genetic - genetics ; population genetics ; Sequence Analysis, DNA - methods ; short tandem repeats ; Software ; trinucleotide repeat expansion</subject><ispartof>American journal of human genetics, 2017-11, Vol.101 (5), p.700-715</ispartof><rights>2017 The Author(s)</rights><rights>Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.</rights><rights>2017 The Author(s) 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c521t-be78c29a0e7fc8ba75fe5c119ea49401d32fe279b4e5bff3a11afc5d186fe35a3</citedby><cites>FETCH-LOGICAL-c521t-be78c29a0e7fc8ba75fe5c119ea49401d32fe279b4e5bff3a11afc5d186fe35a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5673627/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0002929717303828$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,3537,27901,27902,53766,53768,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29100084$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tang, Haibao</creatorcontrib><creatorcontrib>Kirkness, Ewen F.</creatorcontrib><creatorcontrib>Lippert, Christoph</creatorcontrib><creatorcontrib>Biggs, William H.</creatorcontrib><creatorcontrib>Fabani, Martin</creatorcontrib><creatorcontrib>Guzman, Ernesto</creatorcontrib><creatorcontrib>Ramakrishnan, Smriti</creatorcontrib><creatorcontrib>Lavrenko, Victor</creatorcontrib><creatorcontrib>Kakaradov, Boyko</creatorcontrib><creatorcontrib>Hou, Claire</creatorcontrib><creatorcontrib>Hicks, Barry</creatorcontrib><creatorcontrib>Heckerman, David</creatorcontrib><creatorcontrib>Och, Franz J.</creatorcontrib><creatorcontrib>Caskey, C. Thomas</creatorcontrib><creatorcontrib>Venter, J. Craig</creatorcontrib><creatorcontrib>Telenti, Amalio</creatorcontrib><title>Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes</title><title>American journal of human genetics</title><addtitle>Am J Hum Genet</addtitle><description>Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.</description><subject>Adolescent</subject><subject>Adult</subject><subject>Alleles</subject><subject>Child</subject><subject>Female</subject><subject>genetic disorder</subject><subject>Genetics, Population - methods</subject><subject>genome sequencing</subject><subject>Genome, Human - genetics</subject><subject>genotyping</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>Humans</subject><subject>Male</subject><subject>Microsatellite Repeats - genetics</subject><subject>microsatellites</subject><subject>Middle Aged</subject><subject>Polymorphism, Genetic - genetics</subject><subject>population genetics</subject><subject>Sequence Analysis, DNA - methods</subject><subject>short tandem repeats</subject><subject>Software</subject><subject>trinucleotide repeat expansion</subject><issn>0002-9297</issn><issn>1537-6605</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kUFv1DAQhS0EotvCH-CAfORAgseO7VhCSFUpLVIFCIo4Wo4z3vUqiRc7W4l_T1ZbKrhwmsP73pvRPEJeAKuBgXqzrd12s645A10zUzMQj8gKpNCVUkw-JivGGK8MN_qEnJayZQygZeIpOeEGFq1tVuTTl5xCHOK0pinQb5uU5-rWTT2O1VfcoZvp-1jQFaTnw4ADFhonCvy1Epxe70c30R-bNCC9wimNWJ6RJ8ENBZ_fzzPy_cPl7cV1dfP56uPF-U3lJYe56lC3nhvHUAffdk7LgNIDGHSNaRj0ggfk2nQNyi4E4QBc8LKHVgUU0okz8u6Yu9t3I_Yepzm7we5yHF3-ZZOL9l9lihu7TndWKi0U10vAq_uAnH7uscx2jMXjMLgJ075YMKrhqm0kX1B-RH1OpWQMD2uA2UMRdmsPRdhDEZYZuxSxmF7-feCD5c_nF-DtEcDlTXcRsy0-4uSxjxn9bPsU_5f_G690mfM</recordid><startdate>20171102</startdate><enddate>20171102</enddate><creator>Tang, Haibao</creator><creator>Kirkness, Ewen F.</creator><creator>Lippert, Christoph</creator><creator>Biggs, William H.</creator><creator>Fabani, Martin</creator><creator>Guzman, Ernesto</creator><creator>Ramakrishnan, Smriti</creator><creator>Lavrenko, Victor</creator><creator>Kakaradov, Boyko</creator><creator>Hou, Claire</creator><creator>Hicks, Barry</creator><creator>Heckerman, David</creator><creator>Och, Franz J.</creator><creator>Caskey, C. Thomas</creator><creator>Venter, J. Craig</creator><creator>Telenti, Amalio</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20171102</creationdate><title>Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes</title><author>Tang, Haibao ; Kirkness, Ewen F. ; Lippert, Christoph ; Biggs, William H. ; Fabani, Martin ; Guzman, Ernesto ; Ramakrishnan, Smriti ; Lavrenko, Victor ; Kakaradov, Boyko ; Hou, Claire ; Hicks, Barry ; Heckerman, David ; Och, Franz J. ; Caskey, C. Thomas ; Venter, J. Craig ; Telenti, Amalio</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c521t-be78c29a0e7fc8ba75fe5c119ea49401d32fe279b4e5bff3a11afc5d186fe35a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Adolescent</topic><topic>Adult</topic><topic>Alleles</topic><topic>Child</topic><topic>Female</topic><topic>genetic disorder</topic><topic>Genetics, Population - methods</topic><topic>genome sequencing</topic><topic>Genome, Human - genetics</topic><topic>genotyping</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>Humans</topic><topic>Male</topic><topic>Microsatellite Repeats - genetics</topic><topic>microsatellites</topic><topic>Middle Aged</topic><topic>Polymorphism, Genetic - genetics</topic><topic>population genetics</topic><topic>Sequence Analysis, DNA - methods</topic><topic>short tandem repeats</topic><topic>Software</topic><topic>trinucleotide repeat expansion</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tang, Haibao</creatorcontrib><creatorcontrib>Kirkness, Ewen F.</creatorcontrib><creatorcontrib>Lippert, Christoph</creatorcontrib><creatorcontrib>Biggs, William H.</creatorcontrib><creatorcontrib>Fabani, Martin</creatorcontrib><creatorcontrib>Guzman, Ernesto</creatorcontrib><creatorcontrib>Ramakrishnan, Smriti</creatorcontrib><creatorcontrib>Lavrenko, Victor</creatorcontrib><creatorcontrib>Kakaradov, Boyko</creatorcontrib><creatorcontrib>Hou, Claire</creatorcontrib><creatorcontrib>Hicks, Barry</creatorcontrib><creatorcontrib>Heckerman, David</creatorcontrib><creatorcontrib>Och, Franz J.</creatorcontrib><creatorcontrib>Caskey, C. Thomas</creatorcontrib><creatorcontrib>Venter, J. Craig</creatorcontrib><creatorcontrib>Telenti, Amalio</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>American journal of human genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tang, Haibao</au><au>Kirkness, Ewen F.</au><au>Lippert, Christoph</au><au>Biggs, William H.</au><au>Fabani, Martin</au><au>Guzman, Ernesto</au><au>Ramakrishnan, Smriti</au><au>Lavrenko, Victor</au><au>Kakaradov, Boyko</au><au>Hou, Claire</au><au>Hicks, Barry</au><au>Heckerman, David</au><au>Och, Franz J.</au><au>Caskey, C. Thomas</au><au>Venter, J. Craig</au><au>Telenti, Amalio</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes</atitle><jtitle>American journal of human genetics</jtitle><addtitle>Am J Hum Genet</addtitle><date>2017-11-02</date><risdate>2017</risdate><volume>101</volume><issue>5</issue><spage>700</spage><epage>715</epage><pages>700-715</pages><issn>0002-9297</issn><eissn>1537-6605</eissn><abstract>Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>29100084</pmid><doi>10.1016/j.ajhg.2017.09.013</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0002-9297
ispartof American journal of human genetics, 2017-11, Vol.101 (5), p.700-715
issn 0002-9297
1537-6605
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5673627
source MEDLINE; Cell Press Free Archives; Elsevier ScienceDirect Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
subjects Adolescent
Adult
Alleles
Child
Female
genetic disorder
Genetics, Population - methods
genome sequencing
Genome, Human - genetics
genotyping
High-Throughput Nucleotide Sequencing - methods
Humans
Male
Microsatellite Repeats - genetics
microsatellites
Middle Aged
Polymorphism, Genetic - genetics
population genetics
Sequence Analysis, DNA - methods
short tandem repeats
Software
trinucleotide repeat expansion
title Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T04%3A43%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Profiling%20of%20Short-Tandem-Repeat%20Disease%20Alleles%20in%2012,632%20Human%20Whole%20Genomes&rft.jtitle=American%20journal%20of%20human%20genetics&rft.au=Tang,%20Haibao&rft.date=2017-11-02&rft.volume=101&rft.issue=5&rft.spage=700&rft.epage=715&rft.pages=700-715&rft.issn=0002-9297&rft.eissn=1537-6605&rft_id=info:doi/10.1016/j.ajhg.2017.09.013&rft_dat=%3Cproquest_pubme%3E1964268452%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1964268452&rft_id=info:pmid/29100084&rft_els_id=S0002929717303828&rfr_iscdi=true