Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression

Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection fr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kraev, Egor, Koseoglu, Baran, Traverso, Luca, Topiwalla, Mohammed
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Kraev, Egor Koseoglu, Baran Traverso, Luca Topiwalla, Mohammed
description	Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.
doi_str_mv	10.48550/arxiv.2410.06815
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_06815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_06815</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_068153</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJhZGJpyMngHZyQW6Aan5qQml1gp-GSmZ5SUp4JIBbfUxJLSolQFiFxmfp5CaHFmXrpCsIdjgEJYYk5parFCYl6KQlBqelFqcTFQAQ8Da1piTnEqL5TmZpB3cw1x9tAFWxtfUJSZm1hUGQ-yPh5svTFhFQBXxzoL</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><source>arXiv.org</source><creator>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</creator><creatorcontrib>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</creatorcontrib><description>Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.</description><identifier>DOI: 10.48550/arxiv.2410.06815</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.06815$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.06815$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kraev, Egor</creatorcontrib><creatorcontrib>Koseoglu, Baran</creatorcontrib><creatorcontrib>Traverso, Luca</creatorcontrib><creatorcontrib>Topiwalla, Mohammed</creatorcontrib><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><description>Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJhZGJpyMngHZyQW6Aan5qQml1gp-GSmZ5SUp4JIBbfUxJLSolQFiFxmfp5CaHFmXrpCsIdjgEJYYk5parFCYl6KQlBqelFqcTFQAQ8Da1piTnEqL5TmZpB3cw1x9tAFWxtfUJSZm1hUGQ-yPh5svTFhFQBXxzoL</recordid><startdate>20241009</startdate><enddate>20241009</enddate><creator>Kraev, Egor</creator><creator>Koseoglu, Baran</creator><creator>Traverso, Luca</creator><creator>Topiwalla, Mohammed</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241009</creationdate><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><author>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_068153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Kraev, Egor</creatorcontrib><creatorcontrib>Koseoglu, Baran</creatorcontrib><creatorcontrib>Traverso, Luca</creatorcontrib><creatorcontrib>Topiwalla, Mohammed</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kraev, Egor</au><au>Koseoglu, Baran</au><au>Traverso, Luca</au><au>Topiwalla, Mohammed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</atitle><date>2024-10-09</date><risdate>2024</risdate><abstract>Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.</abstract><doi>10.48550/arxiv.2410.06815</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.06815
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_06815
source	arXiv.org
subjects	Computer Science - Learning
title	Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T05%3A50%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Shap-Select:%20Lightweight%20Feature%20Selection%20Using%20SHAP%20Values%20and%20Regression&rft.au=Kraev,%20Egor&rft.date=2024-10-09&rft_id=info:doi/10.48550/arxiv.2410.06815&rft_dat=%3Carxiv_GOX%3E2410_06815%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true