Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses
We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024-01, Vol.12, p.1-1 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE access |
container_volume | 12 |
creator | Liu, Hui Liu, Xiang Diao, Jing Ye, Wenting Liu, Xueling Wei, Dehui |
description | We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area. |
doi_str_mv | 10.1109/ACCESS.2024.3384309 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10488404</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10488404</ieee_id><doaj_id>oai_doaj_org_article_1b1ef698156b4e45b7bc98012a462253</doaj_id><sourcerecordid>3040054018</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-dc5b342e79837a2e0b0a04876aa12a1beb8711fec347516e849a6c9c2544d9253</originalsourceid><addsrcrecordid>eNpNUV1LYzEQvSwKK-ov2H0I7HO7-bw3eZT6UUEQbN3XMEmnNeXa1CQX8d-b7i3iMDDDYc45DKdpfjE6ZYyav1ez2c1iMeWUy6kQWgpqfjRnnLVmIpRoT77tP5vLnLe0lq6Q6s4au9hDykj-QQrgeiQL7NGXEHek9jxsXsh1eMVdrgj0ZI4FU9zgDuOQyTUUIO-hvJBlwkotafBlSLgiT5j3cZcxXzSna-gzXh7nefN8e7OczScPj3f3s6uHiRfKlMnKKyckx85o0QFH6ihQqbsWgHFgDp3uGFujF7JTrEUtDbTeeK6kXBmuxHlzP-quImztPoVXSB82QrD_gZg2FlIJvkfLHMN1azRTrZMoleucN5pWH9nyKlW1_oxa-xTfBszFbuOQ6vvZCiopVZIyXa_EeOVTzDnh-suVUXsIxo7B2EMw9hhMZf0eWQERvzGk1pJK8Qm0b4i5</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3040054018</pqid></control><display><type>article</type><title>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Liu, Hui ; Liu, Xiang ; Diao, Jing ; Ye, Wenting ; Liu, Xueling ; Wei, Dehui</creator><creatorcontrib>Liu, Hui ; Liu, Xiang ; Diao, Jing ; Ye, Wenting ; Liu, Xueling ; Wei, Dehui</creatorcontrib><description>We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3384309</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Biological system modeling ; Confounding factors ; Correlation ; Covariance matrices ; Datasets ; genome-wide association study ; Genomics ; Heterogeneity ; Input variables ; Mathematical models ; mixed model ; Regression tree analysis ; variable selection ; Variables ; Vectors</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-dc5b342e79837a2e0b0a04876aa12a1beb8711fec347516e849a6c9c2544d9253</cites><orcidid>0000-0002-1330-6469 ; 0000-0002-2478-7659 ; 0009-0007-4940-8903 ; 0009-0001-2727-033X ; 0009-0006-8550-3767 ; 0000-0002-6952-5062</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10488404$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Liu, Hui</creatorcontrib><creatorcontrib>Liu, Xiang</creatorcontrib><creatorcontrib>Diao, Jing</creatorcontrib><creatorcontrib>Ye, Wenting</creatorcontrib><creatorcontrib>Liu, Xueling</creatorcontrib><creatorcontrib>Wei, Dehui</creatorcontrib><title>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</title><title>IEEE access</title><addtitle>Access</addtitle><description>We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.</description><subject>Biological system modeling</subject><subject>Confounding factors</subject><subject>Correlation</subject><subject>Covariance matrices</subject><subject>Datasets</subject><subject>genome-wide association study</subject><subject>Genomics</subject><subject>Heterogeneity</subject><subject>Input variables</subject><subject>Mathematical models</subject><subject>mixed model</subject><subject>Regression tree analysis</subject><subject>variable selection</subject><subject>Variables</subject><subject>Vectors</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1LYzEQvSwKK-ov2H0I7HO7-bw3eZT6UUEQbN3XMEmnNeXa1CQX8d-b7i3iMDDDYc45DKdpfjE6ZYyav1ez2c1iMeWUy6kQWgpqfjRnnLVmIpRoT77tP5vLnLe0lq6Q6s4au9hDykj-QQrgeiQL7NGXEHek9jxsXsh1eMVdrgj0ZI4FU9zgDuOQyTUUIO-hvJBlwkotafBlSLgiT5j3cZcxXzSna-gzXh7nefN8e7OczScPj3f3s6uHiRfKlMnKKyckx85o0QFH6ihQqbsWgHFgDp3uGFujF7JTrEUtDbTeeK6kXBmuxHlzP-quImztPoVXSB82QrD_gZg2FlIJvkfLHMN1azRTrZMoleucN5pWH9nyKlW1_oxa-xTfBszFbuOQ6vvZCiopVZIyXa_EeOVTzDnh-suVUXsIxo7B2EMw9hhMZf0eWQERvzGk1pJK8Qm0b4i5</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Liu, Hui</creator><creator>Liu, Xiang</creator><creator>Diao, Jing</creator><creator>Ye, Wenting</creator><creator>Liu, Xueling</creator><creator>Wei, Dehui</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-1330-6469</orcidid><orcidid>https://orcid.org/0000-0002-2478-7659</orcidid><orcidid>https://orcid.org/0009-0007-4940-8903</orcidid><orcidid>https://orcid.org/0009-0001-2727-033X</orcidid><orcidid>https://orcid.org/0009-0006-8550-3767</orcidid><orcidid>https://orcid.org/0000-0002-6952-5062</orcidid></search><sort><creationdate>20240101</creationdate><title>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</title><author>Liu, Hui ; Liu, Xiang ; Diao, Jing ; Ye, Wenting ; Liu, Xueling ; Wei, Dehui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-dc5b342e79837a2e0b0a04876aa12a1beb8711fec347516e849a6c9c2544d9253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Biological system modeling</topic><topic>Confounding factors</topic><topic>Correlation</topic><topic>Covariance matrices</topic><topic>Datasets</topic><topic>genome-wide association study</topic><topic>Genomics</topic><topic>Heterogeneity</topic><topic>Input variables</topic><topic>Mathematical models</topic><topic>mixed model</topic><topic>Regression tree analysis</topic><topic>variable selection</topic><topic>Variables</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Hui</creatorcontrib><creatorcontrib>Liu, Xiang</creatorcontrib><creatorcontrib>Diao, Jing</creatorcontrib><creatorcontrib>Ye, Wenting</creatorcontrib><creatorcontrib>Liu, Xueling</creatorcontrib><creatorcontrib>Wei, Dehui</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Hui</au><au>Liu, Xiang</au><au>Diao, Jing</au><au>Ye, Wenting</au><au>Liu, Xueling</au><au>Wei, Dehui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3384309</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-1330-6469</orcidid><orcidid>https://orcid.org/0000-0002-2478-7659</orcidid><orcidid>https://orcid.org/0009-0007-4940-8903</orcidid><orcidid>https://orcid.org/0009-0001-2727-033X</orcidid><orcidid>https://orcid.org/0009-0006-8550-3767</orcidid><orcidid>https://orcid.org/0000-0002-6952-5062</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024-01, Vol.12, p.1-1 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_ieee_primary_10488404 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Biological system modeling Confounding factors Correlation Covariance matrices Datasets genome-wide association study Genomics Heterogeneity Input variables Mathematical models mixed model Regression tree analysis variable selection Variables Vectors |
title | Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T05%3A23%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sparse%20Variable%20Selection%20on%20High%20Dimensional%20Heterogeneous%20Data%20with%20Tree%20Structured%20Responses&rft.jtitle=IEEE%20access&rft.au=Liu,%20Hui&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3384309&rft_dat=%3Cproquest_ieee_%3E3040054018%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3040054018&rft_id=info:pmid/&rft_ieee_id=10488404&rft_doaj_id=oai_doaj_org_article_1b1ef698156b4e45b7bc98012a462253&rfr_iscdi=true |