Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression
Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This paper presents a novel feature selection fr...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Kraev, Egor Koseoglu, Baran Traverso, Luca Topiwalla, Mohammed |
description | Feature selection is an essential process in machine learning, especially
when dealing with high-dimensional datasets. It helps reduce the complexity of
machine learning models, improve performance, mitigate overfitting, and
decrease computation time. This paper presents a novel feature selection
framework, shap-select. The framework conducts a linear or logistic regression
of the target on the Shapley values of the features, on the validation set, and
uses the signs and significance levels of the regression coefficients to
implement an efficient heuristic for feature selection in tabular regression
and classification tasks. We evaluate shap-select on the Kaggle credit card
fraud dataset, demonstrating its effectiveness compared to established methods
such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based
feature selection method), Boruta and a simpler Shapley value-based method. Our
findings show that shap-select combines interpretability, computational
efficiency, and performance, offering a robust solution for feature selection. |
doi_str_mv | 10.48550/arxiv.2410.06815 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_06815</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_06815</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_068153</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJhZGJpyMngHZyQW6Aan5qQml1gp-GSmZ5SUp4JIBbfUxJLSolQFiFxmfp5CaHFmXrpCsIdjgEJYYk5parFCYl6KQlBqelFqcTFQAQ8Da1piTnEqL5TmZpB3cw1x9tAFWxtfUJSZm1hUGQ-yPh5svTFhFQBXxzoL</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><source>arXiv.org</source><creator>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</creator><creatorcontrib>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</creatorcontrib><description>Feature selection is an essential process in machine learning, especially
when dealing with high-dimensional datasets. It helps reduce the complexity of
machine learning models, improve performance, mitigate overfitting, and
decrease computation time. This paper presents a novel feature selection
framework, shap-select. The framework conducts a linear or logistic regression
of the target on the Shapley values of the features, on the validation set, and
uses the signs and significance levels of the regression coefficients to
implement an efficient heuristic for feature selection in tabular regression
and classification tasks. We evaluate shap-select on the Kaggle credit card
fraud dataset, demonstrating its effectiveness compared to established methods
such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based
feature selection method), Boruta and a simpler Shapley value-based method. Our
findings show that shap-select combines interpretability, computational
efficiency, and performance, offering a robust solution for feature selection.</description><identifier>DOI: 10.48550/arxiv.2410.06815</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.06815$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.06815$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kraev, Egor</creatorcontrib><creatorcontrib>Koseoglu, Baran</creatorcontrib><creatorcontrib>Traverso, Luca</creatorcontrib><creatorcontrib>Topiwalla, Mohammed</creatorcontrib><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><description>Feature selection is an essential process in machine learning, especially
when dealing with high-dimensional datasets. It helps reduce the complexity of
machine learning models, improve performance, mitigate overfitting, and
decrease computation time. This paper presents a novel feature selection
framework, shap-select. The framework conducts a linear or logistic regression
of the target on the Shapley values of the features, on the validation set, and
uses the signs and significance levels of the regression coefficients to
implement an efficient heuristic for feature selection in tabular regression
and classification tasks. We evaluate shap-select on the Kaggle credit card
fraud dataset, demonstrating its effectiveness compared to established methods
such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based
feature selection method), Boruta and a simpler Shapley value-based method. Our
findings show that shap-select combines interpretability, computational
efficiency, and performance, offering a robust solution for feature selection.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJhZGJpyMngHZyQW6Aan5qQml1gp-GSmZ5SUp4JIBbfUxJLSolQFiFxmfp5CaHFmXrpCsIdjgEJYYk5parFCYl6KQlBqelFqcTFQAQ8Da1piTnEqL5TmZpB3cw1x9tAFWxtfUJSZm1hUGQ-yPh5svTFhFQBXxzoL</recordid><startdate>20241009</startdate><enddate>20241009</enddate><creator>Kraev, Egor</creator><creator>Koseoglu, Baran</creator><creator>Traverso, Luca</creator><creator>Topiwalla, Mohammed</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241009</creationdate><title>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</title><author>Kraev, Egor ; Koseoglu, Baran ; Traverso, Luca ; Topiwalla, Mohammed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_068153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Kraev, Egor</creatorcontrib><creatorcontrib>Koseoglu, Baran</creatorcontrib><creatorcontrib>Traverso, Luca</creatorcontrib><creatorcontrib>Topiwalla, Mohammed</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kraev, Egor</au><au>Koseoglu, Baran</au><au>Traverso, Luca</au><au>Topiwalla, Mohammed</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression</atitle><date>2024-10-09</date><risdate>2024</risdate><abstract>Feature selection is an essential process in machine learning, especially
when dealing with high-dimensional datasets. It helps reduce the complexity of
machine learning models, improve performance, mitigate overfitting, and
decrease computation time. This paper presents a novel feature selection
framework, shap-select. The framework conducts a linear or logistic regression
of the target on the Shapley values of the features, on the validation set, and
uses the signs and significance levels of the regression coefficients to
implement an efficient heuristic for feature selection in tabular regression
and classification tasks. We evaluate shap-select on the Kaggle credit card
fraud dataset, demonstrating its effectiveness compared to established methods
such as Recursive Feature Elimination (RFE), HISEL (a mutual information-based
feature selection method), Boruta and a simpler Shapley value-based method. Our
findings show that shap-select combines interpretability, computational
efficiency, and performance, offering a robust solution for feature selection.</abstract><doi>10.48550/arxiv.2410.06815</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.06815 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_06815 |
source | arXiv.org |
subjects | Computer Science - Learning |
title | Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T05%3A50%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Shap-Select:%20Lightweight%20Feature%20Selection%20Using%20SHAP%20Values%20and%20Regression&rft.au=Kraev,%20Egor&rft.date=2024-10-09&rft_id=info:doi/10.48550/arxiv.2410.06815&rft_dat=%3Carxiv_GOX%3E2410_06815%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |