Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression

Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection fr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kraev, Egor, Koseoglu, Baran, Traverso, Luca, Topiwalla, Mohammed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Kraev, Egor
Koseoglu, Baran
Traverso, Luca
Topiwalla, Mohammed
description Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.
doi_str_mv 10.48550/arxiv.2410.06815
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_06815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_06815</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_068153</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJhZGJpyMngHZyQW6Aan5qQml1gp-GSmZ5SUp4JIBbfUxJLSolQFiFxmfp5CaHFmXrpCsIdjgEJYYk5parFCYl6KQlBqelFqcTFQAQ8Da1piTnEqL5TmZpB3cw1x9tAFWxtfUJSZm1hUGQ-yPh5svTFhFQBXxzoL</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><source>arXiv.org</source><creator>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</creator><creatorcontrib>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</creatorcontrib><description>Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.</description><identifier>DOI: 10.48550/arxiv.2410.06815</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.06815$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.06815$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kraev, Egor</creatorcontrib><creatorcontrib>Koseoglu, Baran</creatorcontrib><creatorcontrib>Traverso, Luca</creatorcontrib><creatorcontrib>Topiwalla, Mohammed</creatorcontrib><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><description>Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJhZGJpyMngHZyQW6Aan5qQml1gp-GSmZ5SUp4JIBbfUxJLSolQFiFxmfp5CaHFmXrpCsIdjgEJYYk5parFCYl6KQlBqelFqcTFQAQ8Da1piTnEqL5TmZpB3cw1x9tAFWxtfUJSZm1hUGQ-yPh5svTFhFQBXxzoL</recordid><startdate>20241009</startdate><enddate>20241009</enddate><creator>Kraev, Egor</creator><creator>Koseoglu, Baran</creator><creator>Traverso, Luca</creator><creator>Topiwalla, Mohammed</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241009</creationdate><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><author>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_068153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Kraev, Egor</creatorcontrib><creatorcontrib>Koseoglu, Baran</creatorcontrib><creatorcontrib>Traverso, Luca</creatorcontrib><creatorcontrib>Topiwalla, Mohammed</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kraev, Egor</au><au>Koseoglu, Baran</au><au>Traverso, Luca</au><au>Topiwalla, Mohammed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</atitle><date>2024-10-09</date><risdate>2024</risdate><abstract>Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection framework, shap-select. The framework conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks. We evaluate shap-select on the Kaggle credit card fraud dataset, demonstrating its effectiveness compared to established methods such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based feature selection method), Boruta and a simpler Shapley value-based method. Our findings show that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection.</abstract><doi>10.48550/arxiv.2410.06815</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2410.06815
ispartof
issn
language eng
recordid cdi_arxiv_primary_2410_06815
source arXiv.org
subjects Computer Science - Learning
title Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T05%3A50%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Shap-Select:%20Lightweight%20Feature%20Selection%20Using%20SHAP%20Values%20and%20Regression&rft.au=Kraev,%20Egor&rft.date=2024-10-09&rft_id=info:doi/10.48550/arxiv.2410.06815&rft_dat=%3Carxiv_GOX%3E2410_06815%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true