A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi-SNP Association Patterns in Genome-Wide SNPs Data

ABSTRACT Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genetic epidemiology 2014-11, Vol.38 (7), p.610-621
Hauptverfasser: Climer, Sharlee, Yang, Wei, de las Fuentes, Lisa, Dávila-Román, Victor G., Gu, C. Charles
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 621
container_issue 7
container_start_page 610
container_title Genetic epidemiology
container_volume 38
creator Climer, Sharlee
Yang, Wei
de las Fuentes, Lisa
Dávila-Román, Victor G.
Gu, C. Charles
description ABSTRACT Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3‐step process to identify candidate multi‐SNP patterns: (1) pairwise (SNP–SNP) correlations are computed using CCC; (2) clusters of so‐correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease‐associated multi‐SNP patterns. This method identified 42 candidate multi‐SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation‐contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease‐associated multi‐SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.
doi_str_mv 10.1002/gepi.21833
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4190009</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1609307733</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5893-69a1291a98db261dfa58f627dd0e58fe5fd1561e0be4eddc09fc15d85d73dd9a3</originalsourceid><addsrcrecordid>eNqNkkFv1DAQhS0EotuFCz8AWeJSkFI88TqJL0ir0C4rlrIIUCUuljeetG6zcbCTQq_8clzSroAD4uSR5ntPM-NHyBNgh8BY-vIMO3uYQsH5PTIBJoskTfP0PpmwfAYJ41Lskf0QLhgDmEnxkOylArJCitmE_JjTcgi929LSeY-N7q1rY411bSuLbU8PyrJ8Tudd552uzmntPD3WoadLE7s2UqPE1fTd0PQ2-XiypvMQXGXHxlr3Pfo2UNvSBbZui8mpNUgjF-hr3etH5EGtm4CPb98p-Xx89Kl8k6zeL5blfJVUopA8yaSGVIKWhdmkGZhai6LO0twYhrFCURsQGSDb4AyNqZisKxCmECbnxkjNp-TV6NsNmy2aKo7vdaM6b7faXyunrfqz09pzdeau1AwkY0xGg4NbA---Dhh6tbWhwqbRLbohKMgAMs7iaf8DjX4szzmP6LO_0As3-DZeQoEosriziNyUvBipyrsQPNa7uYGpmxSomxSoXymI8NPfN92hd98eARiBb7bB639YqcXRenlnmowaG3r8vtNof6mynOdCnZ4s1Prtl4X8IDO14j8B1BLMZg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1586129577</pqid></control><display><type>article</type><title>A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi-SNP Association Patterns in Genome-Wide SNPs Data</title><source>MEDLINE</source><source>Wiley Online Library All Journals</source><creator>Climer, Sharlee ; Yang, Wei ; de las Fuentes, Lisa ; Dávila-Román, Victor G. ; Gu, C. Charles</creator><creatorcontrib>Climer, Sharlee ; Yang, Wei ; de las Fuentes, Lisa ; Dávila-Román, Victor G. ; Gu, C. Charles</creatorcontrib><description>ABSTRACT Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3‐step process to identify candidate multi‐SNP patterns: (1) pairwise (SNP–SNP) correlations are computed using CCC; (2) clusters of so‐correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease‐associated multi‐SNP patterns. This method identified 42 candidate multi‐SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation‐contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease‐associated multi‐SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.</description><identifier>ISSN: 0741-0395</identifier><identifier>EISSN: 1098-2272</identifier><identifier>DOI: 10.1002/gepi.21833</identifier><identifier>PMID: 25168954</identifier><language>eng</language><publisher>United States: Blackwell Publishing Ltd</publisher><subject>Algorithms ; Case-Control Studies ; Cluster Analysis ; Computer Simulation ; custom correlation coefficient ; Disease ; Epistasis, Genetic ; Gene Frequency ; gene-gene interaction ; Genes ; Genetic Predisposition to Disease ; Genome, Human ; Genome-Wide Association Study ; genome-wide interactions study (GWIS) ; Genotype ; Heart Diseases - genetics ; Humans ; Models, Genetic ; multi-SNP association ; network analysis ; Polymorphism, Single Nucleotide</subject><ispartof>Genetic epidemiology, 2014-11, Vol.38 (7), p.610-621</ispartof><rights>2014 WILEY PERIODICALS, INC.</rights><rights>2014 Wiley Periodicals, Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5893-69a1291a98db261dfa58f627dd0e58fe5fd1561e0be4eddc09fc15d85d73dd9a3</citedby><cites>FETCH-LOGICAL-c5893-69a1291a98db261dfa58f627dd0e58fe5fd1561e0be4eddc09fc15d85d73dd9a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fgepi.21833$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fgepi.21833$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>230,314,780,784,885,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25168954$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Climer, Sharlee</creatorcontrib><creatorcontrib>Yang, Wei</creatorcontrib><creatorcontrib>de las Fuentes, Lisa</creatorcontrib><creatorcontrib>Dávila-Román, Victor G.</creatorcontrib><creatorcontrib>Gu, C. Charles</creatorcontrib><title>A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi-SNP Association Patterns in Genome-Wide SNPs Data</title><title>Genetic epidemiology</title><addtitle>Genet. Epidemiol</addtitle><description>ABSTRACT Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3‐step process to identify candidate multi‐SNP patterns: (1) pairwise (SNP–SNP) correlations are computed using CCC; (2) clusters of so‐correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease‐associated multi‐SNP patterns. This method identified 42 candidate multi‐SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation‐contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease‐associated multi‐SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.</description><subject>Algorithms</subject><subject>Case-Control Studies</subject><subject>Cluster Analysis</subject><subject>Computer Simulation</subject><subject>custom correlation coefficient</subject><subject>Disease</subject><subject>Epistasis, Genetic</subject><subject>Gene Frequency</subject><subject>gene-gene interaction</subject><subject>Genes</subject><subject>Genetic Predisposition to Disease</subject><subject>Genome, Human</subject><subject>Genome-Wide Association Study</subject><subject>genome-wide interactions study (GWIS)</subject><subject>Genotype</subject><subject>Heart Diseases - genetics</subject><subject>Humans</subject><subject>Models, Genetic</subject><subject>multi-SNP association</subject><subject>network analysis</subject><subject>Polymorphism, Single Nucleotide</subject><issn>0741-0395</issn><issn>1098-2272</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkkFv1DAQhS0EotuFCz8AWeJSkFI88TqJL0ir0C4rlrIIUCUuljeetG6zcbCTQq_8clzSroAD4uSR5ntPM-NHyBNgh8BY-vIMO3uYQsH5PTIBJoskTfP0PpmwfAYJ41Lskf0QLhgDmEnxkOylArJCitmE_JjTcgi929LSeY-N7q1rY411bSuLbU8PyrJ8Tudd552uzmntPD3WoadLE7s2UqPE1fTd0PQ2-XiypvMQXGXHxlr3Pfo2UNvSBbZui8mpNUgjF-hr3etH5EGtm4CPb98p-Xx89Kl8k6zeL5blfJVUopA8yaSGVIKWhdmkGZhai6LO0twYhrFCURsQGSDb4AyNqZisKxCmECbnxkjNp-TV6NsNmy2aKo7vdaM6b7faXyunrfqz09pzdeau1AwkY0xGg4NbA---Dhh6tbWhwqbRLbohKMgAMs7iaf8DjX4szzmP6LO_0As3-DZeQoEosriziNyUvBipyrsQPNa7uYGpmxSomxSoXymI8NPfN92hd98eARiBb7bB639YqcXRenlnmowaG3r8vtNof6mynOdCnZ4s1Prtl4X8IDO14j8B1BLMZg</recordid><startdate>201411</startdate><enddate>201411</enddate><creator>Climer, Sharlee</creator><creator>Yang, Wei</creator><creator>de las Fuentes, Lisa</creator><creator>Dávila-Román, Victor G.</creator><creator>Gu, C. Charles</creator><general>Blackwell Publishing Ltd</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QP</scope><scope>7QR</scope><scope>7TK</scope><scope>8FD</scope><scope>FR3</scope><scope>K9.</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>201411</creationdate><title>A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi-SNP Association Patterns in Genome-Wide SNPs Data</title><author>Climer, Sharlee ; Yang, Wei ; de las Fuentes, Lisa ; Dávila-Román, Victor G. ; Gu, C. Charles</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5893-69a1291a98db261dfa58f627dd0e58fe5fd1561e0be4eddc09fc15d85d73dd9a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Case-Control Studies</topic><topic>Cluster Analysis</topic><topic>Computer Simulation</topic><topic>custom correlation coefficient</topic><topic>Disease</topic><topic>Epistasis, Genetic</topic><topic>Gene Frequency</topic><topic>gene-gene interaction</topic><topic>Genes</topic><topic>Genetic Predisposition to Disease</topic><topic>Genome, Human</topic><topic>Genome-Wide Association Study</topic><topic>genome-wide interactions study (GWIS)</topic><topic>Genotype</topic><topic>Heart Diseases - genetics</topic><topic>Humans</topic><topic>Models, Genetic</topic><topic>multi-SNP association</topic><topic>network analysis</topic><topic>Polymorphism, Single Nucleotide</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Climer, Sharlee</creatorcontrib><creatorcontrib>Yang, Wei</creatorcontrib><creatorcontrib>de las Fuentes, Lisa</creatorcontrib><creatorcontrib>Dávila-Román, Victor G.</creatorcontrib><creatorcontrib>Gu, C. Charles</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Genetic epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Climer, Sharlee</au><au>Yang, Wei</au><au>de las Fuentes, Lisa</au><au>Dávila-Román, Victor G.</au><au>Gu, C. Charles</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi-SNP Association Patterns in Genome-Wide SNPs Data</atitle><jtitle>Genetic epidemiology</jtitle><addtitle>Genet. Epidemiol</addtitle><date>2014-11</date><risdate>2014</risdate><volume>38</volume><issue>7</issue><spage>610</spage><epage>621</epage><pages>610-621</pages><issn>0741-0395</issn><eissn>1098-2272</eissn><abstract>ABSTRACT Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3‐step process to identify candidate multi‐SNP patterns: (1) pairwise (SNP–SNP) correlations are computed using CCC; (2) clusters of so‐correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease‐associated multi‐SNP patterns. This method identified 42 candidate multi‐SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation‐contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease‐associated multi‐SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.</abstract><cop>United States</cop><pub>Blackwell Publishing Ltd</pub><pmid>25168954</pmid><doi>10.1002/gepi.21833</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0741-0395
ispartof Genetic epidemiology, 2014-11, Vol.38 (7), p.610-621
issn 0741-0395
1098-2272
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4190009
source MEDLINE; Wiley Online Library All Journals
subjects Algorithms
Case-Control Studies
Cluster Analysis
Computer Simulation
custom correlation coefficient
Disease
Epistasis, Genetic
Gene Frequency
gene-gene interaction
Genes
Genetic Predisposition to Disease
Genome, Human
Genome-Wide Association Study
genome-wide interactions study (GWIS)
Genotype
Heart Diseases - genetics
Humans
Models, Genetic
multi-SNP association
network analysis
Polymorphism, Single Nucleotide
title A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi-SNP Association Patterns in Genome-Wide SNPs Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T13%3A53%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Custom%20Correlation%20Coefficient%20(CCC)%20Approach%20for%20Fast%20Identification%20of%20Multi-SNP%20Association%20Patterns%20in%20Genome-Wide%20SNPs%20Data&rft.jtitle=Genetic%20epidemiology&rft.au=Climer,%20Sharlee&rft.date=2014-11&rft.volume=38&rft.issue=7&rft.spage=610&rft.epage=621&rft.pages=610-621&rft.issn=0741-0395&rft.eissn=1098-2272&rft_id=info:doi/10.1002/gepi.21833&rft_dat=%3Cproquest_pubme%3E1609307733%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1586129577&rft_id=info:pmid/25168954&rfr_iscdi=true