The sequence kernel association test for multicategorical outcomes
Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either...
Gespeichert in:
Veröffentlicht in: | Genetic epidemiology 2023-09, Vol.47 (6), p.432-449 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 449 |
---|---|
container_issue | 6 |
container_start_page | 432 |
container_title | Genetic epidemiology |
container_volume | 47 |
creator | Jiang, Zhiwen Zhang, Haoyu Ahearn, Thomas U. Garcia‐Closas, Montserrat Chatterjee, Nilanjan Zhu, Hongtu Zhan, Xiang Zhao, Ni |
description | Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N
=
127
,
127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC. |
doi_str_mv | 10.1002/gepi.22527 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2803965996</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2850217070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3527-a13211d4cff1584c4ee124c75b1ff98730981a6c562ed2b1765b5d40db8339593</originalsourceid><addsrcrecordid>eNp9kLFOwzAURS0EoqWw8AEoEgtCSvFz4jgZoSqlUiUYymw5zktJSeJiJ0L9e1xaGBiY7nJ0de4l5BLoGChldyvcVGPGOBNHZAg0S0PGBDsmQypiCGmU8QE5c25NKUCc8VMyiAQVKdB0SB6Wbxg4_Oix1Ri8o22xDpRzRleqq0wbdOi6oDQ2aPq6q7TqcGWszzowfadNg-6cnJSqdnhxyBF5fZwuJ0_h4nk2n9wvQh15tVBBxACKWJcl8DTWMSKwWAueQ1lmqYi8OKhE84RhwXIQCc95EdMiTyM_IYtG5Gbfu7HG-7pONpXTWNeqRdM7yVI_NeFZlnj0-g-6Nr1tvZ2nOGXg91NP3e4pbY1zFku5sVWj7FYClbtn5e5Z-f2sh68OlX3eYPGL_lzpAdgDn1WN23-q5Gz6Mt-XfgGxOYII</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2850217070</pqid></control><display><type>article</type><title>The sequence kernel association test for multicategorical outcomes</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Jiang, Zhiwen ; Zhang, Haoyu ; Ahearn, Thomas U. ; Garcia‐Closas, Montserrat ; Chatterjee, Nilanjan ; Zhu, Hongtu ; Zhan, Xiang ; Zhao, Ni</creator><creatorcontrib>Jiang, Zhiwen ; Zhang, Haoyu ; Ahearn, Thomas U. ; Garcia‐Closas, Montserrat ; Chatterjee, Nilanjan ; Zhu, Hongtu ; Zhan, Xiang ; Zhao, Ni</creatorcontrib><description>Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N
=
127
,
127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.</description><identifier>ISSN: 0741-0395</identifier><identifier>EISSN: 1098-2272</identifier><identifier>DOI: 10.1002/gepi.22527</identifier><identifier>PMID: 37078108</identifier><language>eng</language><publisher>United States: Wiley Subscription Services, Inc</publisher><subject>Association analysis ; Breast cancer ; Estrogen receptors ; Fibroblast growth factor receptor 2 ; Genetic analysis ; Genomes ; multicategorical data ; SKAT ; the generalized logit model ; the proportional odds model</subject><ispartof>Genetic epidemiology, 2023-09, Vol.47 (6), p.432-449</ispartof><rights>2023 The Authors. published by Wiley Periodicals LLC.</rights><rights>2023 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC.</rights><rights>2023. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c3527-a13211d4cff1584c4ee124c75b1ff98730981a6c562ed2b1765b5d40db8339593</cites><orcidid>0000-0002-7762-3949 ; 0000-0001-8841-618X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fgepi.22527$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fgepi.22527$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37078108$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Zhiwen</creatorcontrib><creatorcontrib>Zhang, Haoyu</creatorcontrib><creatorcontrib>Ahearn, Thomas U.</creatorcontrib><creatorcontrib>Garcia‐Closas, Montserrat</creatorcontrib><creatorcontrib>Chatterjee, Nilanjan</creatorcontrib><creatorcontrib>Zhu, Hongtu</creatorcontrib><creatorcontrib>Zhan, Xiang</creatorcontrib><creatorcontrib>Zhao, Ni</creatorcontrib><title>The sequence kernel association test for multicategorical outcomes</title><title>Genetic epidemiology</title><addtitle>Genet Epidemiol</addtitle><description>Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N
=
127
,
127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.</description><subject>Association analysis</subject><subject>Breast cancer</subject><subject>Estrogen receptors</subject><subject>Fibroblast growth factor receptor 2</subject><subject>Genetic analysis</subject><subject>Genomes</subject><subject>multicategorical data</subject><subject>SKAT</subject><subject>the generalized logit model</subject><subject>the proportional odds model</subject><issn>0741-0395</issn><issn>1098-2272</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><recordid>eNp9kLFOwzAURS0EoqWw8AEoEgtCSvFz4jgZoSqlUiUYymw5zktJSeJiJ0L9e1xaGBiY7nJ0de4l5BLoGChldyvcVGPGOBNHZAg0S0PGBDsmQypiCGmU8QE5c25NKUCc8VMyiAQVKdB0SB6Wbxg4_Oix1Ri8o22xDpRzRleqq0wbdOi6oDQ2aPq6q7TqcGWszzowfadNg-6cnJSqdnhxyBF5fZwuJ0_h4nk2n9wvQh15tVBBxACKWJcl8DTWMSKwWAueQ1lmqYi8OKhE84RhwXIQCc95EdMiTyM_IYtG5Gbfu7HG-7pONpXTWNeqRdM7yVI_NeFZlnj0-g-6Nr1tvZ2nOGXg91NP3e4pbY1zFku5sVWj7FYClbtn5e5Z-f2sh68OlX3eYPGL_lzpAdgDn1WN23-q5Gz6Mt-XfgGxOYII</recordid><startdate>202309</startdate><enddate>202309</enddate><creator>Jiang, Zhiwen</creator><creator>Zhang, Haoyu</creator><creator>Ahearn, Thomas U.</creator><creator>Garcia‐Closas, Montserrat</creator><creator>Chatterjee, Nilanjan</creator><creator>Zhu, Hongtu</creator><creator>Zhan, Xiang</creator><creator>Zhao, Ni</creator><general>Wiley Subscription Services, Inc</general><scope>24P</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QP</scope><scope>7QR</scope><scope>7TK</scope><scope>8FD</scope><scope>FR3</scope><scope>K9.</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7762-3949</orcidid><orcidid>https://orcid.org/0000-0001-8841-618X</orcidid></search><sort><creationdate>202309</creationdate><title>The sequence kernel association test for multicategorical outcomes</title><author>Jiang, Zhiwen ; Zhang, Haoyu ; Ahearn, Thomas U. ; Garcia‐Closas, Montserrat ; Chatterjee, Nilanjan ; Zhu, Hongtu ; Zhan, Xiang ; Zhao, Ni</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3527-a13211d4cff1584c4ee124c75b1ff98730981a6c562ed2b1765b5d40db8339593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Association analysis</topic><topic>Breast cancer</topic><topic>Estrogen receptors</topic><topic>Fibroblast growth factor receptor 2</topic><topic>Genetic analysis</topic><topic>Genomes</topic><topic>multicategorical data</topic><topic>SKAT</topic><topic>the generalized logit model</topic><topic>the proportional odds model</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Zhiwen</creatorcontrib><creatorcontrib>Zhang, Haoyu</creatorcontrib><creatorcontrib>Ahearn, Thomas U.</creatorcontrib><creatorcontrib>Garcia‐Closas, Montserrat</creatorcontrib><creatorcontrib>Chatterjee, Nilanjan</creatorcontrib><creatorcontrib>Zhu, Hongtu</creatorcontrib><creatorcontrib>Zhan, Xiang</creatorcontrib><creatorcontrib>Zhao, Ni</creatorcontrib><collection>Wiley Online Library Open Access</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Genetic epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Zhiwen</au><au>Zhang, Haoyu</au><au>Ahearn, Thomas U.</au><au>Garcia‐Closas, Montserrat</au><au>Chatterjee, Nilanjan</au><au>Zhu, Hongtu</au><au>Zhan, Xiang</au><au>Zhao, Ni</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The sequence kernel association test for multicategorical outcomes</atitle><jtitle>Genetic epidemiology</jtitle><addtitle>Genet Epidemiol</addtitle><date>2023-09</date><risdate>2023</risdate><volume>47</volume><issue>6</issue><spage>432</spage><epage>449</epage><pages>432-449</pages><issn>0741-0395</issn><eissn>1098-2272</eissn><abstract>Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set‐based analysis methods for genome‐wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set‐based association analysis method, sequence kernel association test (SKAT)‐MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT‐MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT‐MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N
=
127
,
127 $N=127,127$) with SKAT‐MC, and identified 21 significant genes in the genome. Consequently, SKAT‐MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT‐MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.</abstract><cop>United States</cop><pub>Wiley Subscription Services, Inc</pub><pmid>37078108</pmid><doi>10.1002/gepi.22527</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-7762-3949</orcidid><orcidid>https://orcid.org/0000-0001-8841-618X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0741-0395 |
ispartof | Genetic epidemiology, 2023-09, Vol.47 (6), p.432-449 |
issn | 0741-0395 1098-2272 |
language | eng |
recordid | cdi_proquest_miscellaneous_2803965996 |
source | Wiley Online Library Journals Frontfile Complete |
subjects | Association analysis Breast cancer Estrogen receptors Fibroblast growth factor receptor 2 Genetic analysis Genomes multicategorical data SKAT the generalized logit model the proportional odds model |
title | The sequence kernel association test for multicategorical outcomes |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T20%3A24%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20sequence%20kernel%20association%20test%20for%20multicategorical%20outcomes&rft.jtitle=Genetic%20epidemiology&rft.au=Jiang,%20Zhiwen&rft.date=2023-09&rft.volume=47&rft.issue=6&rft.spage=432&rft.epage=449&rft.pages=432-449&rft.issn=0741-0395&rft.eissn=1098-2272&rft_id=info:doi/10.1002/gepi.22527&rft_dat=%3Cproquest_cross%3E2850217070%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2850217070&rft_id=info:pmid/37078108&rfr_iscdi=true |