Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses

We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024-01, Vol.12, p.1-1
Hauptverfasser: Liu, Hui, Liu, Xiang, Diao, Jing, Ye, Wenting, Liu, Xueling, Wei, Dehui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 12
creator Liu, Hui
Liu, Xiang
Diao, Jing
Ye, Wenting
Liu, Xueling
Wei, Dehui
description We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.
doi_str_mv 10.1109/ACCESS.2024.3384309
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10488404</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10488404</ieee_id><doaj_id>oai_doaj_org_article_1b1ef698156b4e45b7bc98012a462253</doaj_id><sourcerecordid>3040054018</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-dc5b342e79837a2e0b0a04876aa12a1beb8711fec347516e849a6c9c2544d9253</originalsourceid><addsrcrecordid>eNpNUV1LYzEQvSwKK-ov2H0I7HO7-bw3eZT6UUEQbN3XMEmnNeXa1CQX8d-b7i3iMDDDYc45DKdpfjE6ZYyav1ez2c1iMeWUy6kQWgpqfjRnnLVmIpRoT77tP5vLnLe0lq6Q6s4au9hDykj-QQrgeiQL7NGXEHek9jxsXsh1eMVdrgj0ZI4FU9zgDuOQyTUUIO-hvJBlwkotafBlSLgiT5j3cZcxXzSna-gzXh7nefN8e7OczScPj3f3s6uHiRfKlMnKKyckx85o0QFH6ihQqbsWgHFgDp3uGFujF7JTrEUtDbTeeK6kXBmuxHlzP-quImztPoVXSB82QrD_gZg2FlIJvkfLHMN1azRTrZMoleucN5pWH9nyKlW1_oxa-xTfBszFbuOQ6vvZCiopVZIyXa_EeOVTzDnh-suVUXsIxo7B2EMw9hhMZf0eWQERvzGk1pJK8Qm0b4i5</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3040054018</pqid></control><display><type>article</type><title>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Liu, Hui ; Liu, Xiang ; Diao, Jing ; Ye, Wenting ; Liu, Xueling ; Wei, Dehui</creator><creatorcontrib>Liu, Hui ; Liu, Xiang ; Diao, Jing ; Ye, Wenting ; Liu, Xueling ; Wei, Dehui</creatorcontrib><description>We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3384309</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Biological system modeling ; Confounding factors ; Correlation ; Covariance matrices ; Datasets ; genome-wide association study ; Genomics ; Heterogeneity ; Input variables ; Mathematical models ; mixed model ; Regression tree analysis ; variable selection ; Variables ; Vectors</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-dc5b342e79837a2e0b0a04876aa12a1beb8711fec347516e849a6c9c2544d9253</cites><orcidid>0000-0002-1330-6469 ; 0000-0002-2478-7659 ; 0009-0007-4940-8903 ; 0009-0001-2727-033X ; 0009-0006-8550-3767 ; 0000-0002-6952-5062</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10488404$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Liu, Hui</creatorcontrib><creatorcontrib>Liu, Xiang</creatorcontrib><creatorcontrib>Diao, Jing</creatorcontrib><creatorcontrib>Ye, Wenting</creatorcontrib><creatorcontrib>Liu, Xueling</creatorcontrib><creatorcontrib>Wei, Dehui</creatorcontrib><title>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</title><title>IEEE access</title><addtitle>Access</addtitle><description>We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.</description><subject>Biological system modeling</subject><subject>Confounding factors</subject><subject>Correlation</subject><subject>Covariance matrices</subject><subject>Datasets</subject><subject>genome-wide association study</subject><subject>Genomics</subject><subject>Heterogeneity</subject><subject>Input variables</subject><subject>Mathematical models</subject><subject>mixed model</subject><subject>Regression tree analysis</subject><subject>variable selection</subject><subject>Variables</subject><subject>Vectors</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUV1LYzEQvSwKK-ov2H0I7HO7-bw3eZT6UUEQbN3XMEmnNeXa1CQX8d-b7i3iMDDDYc45DKdpfjE6ZYyav1ez2c1iMeWUy6kQWgpqfjRnnLVmIpRoT77tP5vLnLe0lq6Q6s4au9hDykj-QQrgeiQL7NGXEHek9jxsXsh1eMVdrgj0ZI4FU9zgDuOQyTUUIO-hvJBlwkotafBlSLgiT5j3cZcxXzSna-gzXh7nefN8e7OczScPj3f3s6uHiRfKlMnKKyckx85o0QFH6ihQqbsWgHFgDp3uGFujF7JTrEUtDbTeeK6kXBmuxHlzP-quImztPoVXSB82QrD_gZg2FlIJvkfLHMN1azRTrZMoleucN5pWH9nyKlW1_oxa-xTfBszFbuOQ6vvZCiopVZIyXa_EeOVTzDnh-suVUXsIxo7B2EMw9hhMZf0eWQERvzGk1pJK8Qm0b4i5</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Liu, Hui</creator><creator>Liu, Xiang</creator><creator>Diao, Jing</creator><creator>Ye, Wenting</creator><creator>Liu, Xueling</creator><creator>Wei, Dehui</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-1330-6469</orcidid><orcidid>https://orcid.org/0000-0002-2478-7659</orcidid><orcidid>https://orcid.org/0009-0007-4940-8903</orcidid><orcidid>https://orcid.org/0009-0001-2727-033X</orcidid><orcidid>https://orcid.org/0009-0006-8550-3767</orcidid><orcidid>https://orcid.org/0000-0002-6952-5062</orcidid></search><sort><creationdate>20240101</creationdate><title>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</title><author>Liu, Hui ; Liu, Xiang ; Diao, Jing ; Ye, Wenting ; Liu, Xueling ; Wei, Dehui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-dc5b342e79837a2e0b0a04876aa12a1beb8711fec347516e849a6c9c2544d9253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Biological system modeling</topic><topic>Confounding factors</topic><topic>Correlation</topic><topic>Covariance matrices</topic><topic>Datasets</topic><topic>genome-wide association study</topic><topic>Genomics</topic><topic>Heterogeneity</topic><topic>Input variables</topic><topic>Mathematical models</topic><topic>mixed model</topic><topic>Regression tree analysis</topic><topic>variable selection</topic><topic>Variables</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Hui</creatorcontrib><creatorcontrib>Liu, Xiang</creatorcontrib><creatorcontrib>Diao, Jing</creatorcontrib><creatorcontrib>Ye, Wenting</creatorcontrib><creatorcontrib>Liu, Xueling</creatorcontrib><creatorcontrib>Wei, Dehui</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Hui</au><au>Liu, Xiang</au><au>Diao, Jing</au><au>Ye, Wenting</au><au>Liu, Xueling</au><au>Wei, Dehui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are naïvely applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3384309</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-1330-6469</orcidid><orcidid>https://orcid.org/0000-0002-2478-7659</orcidid><orcidid>https://orcid.org/0009-0007-4940-8903</orcidid><orcidid>https://orcid.org/0009-0001-2727-033X</orcidid><orcidid>https://orcid.org/0009-0006-8550-3767</orcidid><orcidid>https://orcid.org/0000-0002-6952-5062</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024-01, Vol.12, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_10488404
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Biological system modeling
Confounding factors
Correlation
Covariance matrices
Datasets
genome-wide association study
Genomics
Heterogeneity
Input variables
Mathematical models
mixed model
Regression tree analysis
variable selection
Variables
Vectors
title Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T05%3A23%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sparse%20Variable%20Selection%20on%20High%20Dimensional%20Heterogeneous%20Data%20with%20Tree%20Structured%20Responses&rft.jtitle=IEEE%20access&rft.au=Liu,%20Hui&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3384309&rft_dat=%3Cproquest_ieee_%3E3040054018%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3040054018&rft_id=info:pmid/&rft_ieee_id=10488404&rft_doaj_id=oai_doaj_org_article_1b1ef698156b4e45b7bc98012a462253&rfr_iscdi=true