The sequence kernel association test for multicategorical outcomes

Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genetic epidemiology 2023-09, Vol.47 (6), p.432-449
Hauptverfasser: Jiang, Zhiwen, Zhang, Haoyu, Ahearn, Thomas U., Garcia‐Closas, Montserrat, Chatterjee, Nilanjan, Zhu, Hongtu, Zhan, Xiang, Zhao, Ni
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 449
container_issue 6
container_start_page 432
container_title Genetic epidemiology
container_volume 47
creator Jiang, Zhiwen
Zhang, Haoyu
Ahearn, Thomas U.
Garcia‐Closas, Montserrat
Chatterjee, Nilanjan
Zhu, Hongtu
Zhan, Xiang
Zhao, Ni
description Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
doi_str_mv 10.1002/gepi.22527
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2803965996</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2850217070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3527-a13211d4cff1584c4ee124c75b1ff98730981a6c562ed2b1765b5d40db8339593</originalsourceid><addsrcrecordid>eNp9kLFOwzAURS0EoqWw8AEoEgtCSvFz4jgZoSqlUiUYymw5zktJSeJiJ0L9e1xaGBiY7nJ0de4l5BLoGChldyvcVGPGOBNHZAg0S0PGBDsmQypiCGmU8QE5c25NKUCc8VMyiAQVKdB0SB6Wbxg4_Oix1Ri8o22xDpRzRleqq0wbdOi6oDQ2aPq6q7TqcGWszzowfadNg-6cnJSqdnhxyBF5fZwuJ0_h4nk2n9wvQh15tVBBxACKWJcl8DTWMSKwWAueQ1lmqYi8OKhE84RhwXIQCc95EdMiTyM_IYtG5Gbfu7HG-7pONpXTWNeqRdM7yVI_NeFZlnj0-g-6Nr1tvZ2nOGXg91NP3e4pbY1zFku5sVWj7FYClbtn5e5Z-f2sh68OlX3eYPGL_lzpAdgDn1WN23-q5Gz6Mt-XfgGxOYII</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2850217070</pqid></control><display><type>article</type><title>The sequence kernel association test for multicategorical outcomes</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Jiang, Zhiwen ; Zhang, Haoyu ; Ahearn, Thomas U. ; Garcia‐Closas, Montserrat ; Chatterjee, Nilanjan ; Zhu, Hongtu ; Zhan, Xiang ; Zhao, Ni</creator><creatorcontrib>Jiang, Zhiwen ; Zhang, Haoyu ; Ahearn, Thomas U. ; Garcia‐Closas, Montserrat ; Chatterjee, Nilanjan ; Zhu, Hongtu ; Zhan, Xiang ; Zhao, Ni</creatorcontrib><description>Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.</description><identifier>ISSN: 0741-0395</identifier><identifier>EISSN: 1098-2272</identifier><identifier>DOI: 10.1002/gepi.22527</identifier><identifier>PMID: 37078108</identifier><language>eng</language><publisher>United States: Wiley Subscription Services, Inc</publisher><subject>Association analysis ; Breast cancer ; Estrogen receptors ; Fibroblast growth factor receptor 2 ; Genetic analysis ; Genomes ; multicategorical data ; SKAT ; the generalized logit model ; the proportional odds model</subject><ispartof>Genetic epidemiology, 2023-09, Vol.47 (6), p.432-449</ispartof><rights>2023 The Authors. published by Wiley Periodicals LLC.</rights><rights>2023 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC.</rights><rights>2023. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c3527-a13211d4cff1584c4ee124c75b1ff98730981a6c562ed2b1765b5d40db8339593</cites><orcidid>0000-0002-7762-3949 ; 0000-0001-8841-618X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fgepi.22527$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fgepi.22527$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37078108$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Zhiwen</creatorcontrib><creatorcontrib>Zhang, Haoyu</creatorcontrib><creatorcontrib>Ahearn, Thomas U.</creatorcontrib><creatorcontrib>Garcia‐Closas, Montserrat</creatorcontrib><creatorcontrib>Chatterjee, Nilanjan</creatorcontrib><creatorcontrib>Zhu, Hongtu</creatorcontrib><creatorcontrib>Zhan, Xiang</creatorcontrib><creatorcontrib>Zhao, Ni</creatorcontrib><title>The sequence kernel association test for multicategorical outcomes</title><title>Genetic epidemiology</title><addtitle>Genet Epidemiol</addtitle><description>Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.</description><subject>Association analysis</subject><subject>Breast cancer</subject><subject>Estrogen receptors</subject><subject>Fibroblast growth factor receptor 2</subject><subject>Genetic analysis</subject><subject>Genomes</subject><subject>multicategorical data</subject><subject>SKAT</subject><subject>the generalized logit model</subject><subject>the proportional odds model</subject><issn>0741-0395</issn><issn>1098-2272</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><recordid>eNp9kLFOwzAURS0EoqWw8AEoEgtCSvFz4jgZoSqlUiUYymw5zktJSeJiJ0L9e1xaGBiY7nJ0de4l5BLoGChldyvcVGPGOBNHZAg0S0PGBDsmQypiCGmU8QE5c25NKUCc8VMyiAQVKdB0SB6Wbxg4_Oix1Ri8o22xDpRzRleqq0wbdOi6oDQ2aPq6q7TqcGWszzowfadNg-6cnJSqdnhxyBF5fZwuJ0_h4nk2n9wvQh15tVBBxACKWJcl8DTWMSKwWAueQ1lmqYi8OKhE84RhwXIQCc95EdMiTyM_IYtG5Gbfu7HG-7pONpXTWNeqRdM7yVI_NeFZlnj0-g-6Nr1tvZ2nOGXg91NP3e4pbY1zFku5sVWj7FYClbtn5e5Z-f2sh68OlX3eYPGL_lzpAdgDn1WN23-q5Gz6Mt-XfgGxOYII</recordid><startdate>202309</startdate><enddate>202309</enddate><creator>Jiang, Zhiwen</creator><creator>Zhang, Haoyu</creator><creator>Ahearn, Thomas U.</creator><creator>Garcia‐Closas, Montserrat</creator><creator>Chatterjee, Nilanjan</creator><creator>Zhu, Hongtu</creator><creator>Zhan, Xiang</creator><creator>Zhao, Ni</creator><general>Wiley Subscription Services, Inc</general><scope>24P</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QP</scope><scope>7QR</scope><scope>7TK</scope><scope>8FD</scope><scope>FR3</scope><scope>K9.</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7762-3949</orcidid><orcidid>https://orcid.org/0000-0001-8841-618X</orcidid></search><sort><creationdate>202309</creationdate><title>The sequence kernel association test for multicategorical outcomes</title><author>Jiang, Zhiwen ; Zhang, Haoyu ; Ahearn, Thomas U. ; Garcia‐Closas, Montserrat ; Chatterjee, Nilanjan ; Zhu, Hongtu ; Zhan, Xiang ; Zhao, Ni</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3527-a13211d4cff1584c4ee124c75b1ff98730981a6c562ed2b1765b5d40db8339593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Association analysis</topic><topic>Breast cancer</topic><topic>Estrogen receptors</topic><topic>Fibroblast growth factor receptor 2</topic><topic>Genetic analysis</topic><topic>Genomes</topic><topic>multicategorical data</topic><topic>SKAT</topic><topic>the generalized logit model</topic><topic>the proportional odds model</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Zhiwen</creatorcontrib><creatorcontrib>Zhang, Haoyu</creatorcontrib><creatorcontrib>Ahearn, Thomas U.</creatorcontrib><creatorcontrib>Garcia‐Closas, Montserrat</creatorcontrib><creatorcontrib>Chatterjee, Nilanjan</creatorcontrib><creatorcontrib>Zhu, Hongtu</creatorcontrib><creatorcontrib>Zhan, Xiang</creatorcontrib><creatorcontrib>Zhao, Ni</creatorcontrib><collection>Wiley Online Library Open Access</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Genetic epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Zhiwen</au><au>Zhang, Haoyu</au><au>Ahearn, Thomas U.</au><au>Garcia‐Closas, Montserrat</au><au>Chatterjee, Nilanjan</au><au>Zhu, Hongtu</au><au>Zhan, Xiang</au><au>Zhao, Ni</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The sequence kernel association test for multicategorical outcomes</atitle><jtitle>Genetic epidemiology</jtitle><addtitle>Genet Epidemiol</addtitle><date>2023-09</date><risdate>2023</risdate><volume>47</volume><issue>6</issue><spage>432</spage><epage>449</epage><pages>432-449</pages><issn>0741-0395</issn><eissn>1098-2272</eissn><abstract>Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.</abstract><cop>United States</cop><pub>Wiley Subscription Services, Inc</pub><pmid>37078108</pmid><doi>10.1002/gepi.22527</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-7762-3949</orcidid><orcidid>https://orcid.org/0000-0001-8841-618X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0741-0395
ispartof Genetic epidemiology, 2023-09, Vol.47 (6), p.432-449
issn 0741-0395
1098-2272
language eng
recordid cdi_proquest_miscellaneous_2803965996
source Wiley Online Library Journals Frontfile Complete
subjects Association analysis
Breast cancer
Estrogen receptors
Fibroblast growth factor receptor 2
Genetic analysis
Genomes
multicategorical data
SKAT
the generalized logit model
the proportional odds model
title The sequence kernel association test for multicategorical outcomes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T20%3A24%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20sequence%20kernel%20association%20test%20for%20multicategorical%20outcomes&rft.jtitle=Genetic%20epidemiology&rft.au=Jiang,%20Zhiwen&rft.date=2023-09&rft.volume=47&rft.issue=6&rft.spage=432&rft.epage=449&rft.pages=432-449&rft.issn=0741-0395&rft.eissn=1098-2272&rft_id=info:doi/10.1002/gepi.22527&rft_dat=%3Cproquest_cross%3E2850217070%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2850217070&rft_id=info:pmid/37078108&rfr_iscdi=true