Deletion and Insertion Tests in Regression Models

A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function \(f\). The insertion and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of algorithms that rank pixels from most to least important for a classif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-08
Hauptverfasser: Hama, Naofumi, Mase, Masayoshi, Owen, Art B
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Hama, Naofumi
Mase, Masayoshi
Owen, Art B
description A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function \(f\). The insertion and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of \(f\). We find an expression for the expected value of the AUC under a random ordering of inputs to \(f\) and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input\(\times\)gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2669778394</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2669778394</sourcerecordid><originalsourceid>FETCH-proquest_journals_26697783943</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwdEnNSS3JzM9TSMxLUfDMK04tAvNCUotLihUy8xSCUtOLUouLQWK--SmpOcU8DKxpiTnFqbxQmptB2c01xNlDt6Aov7AUqC0-K7-0KA8oFW9kZmZpbm5hbGliTJwqAJg7M4E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2669778394</pqid></control><display><type>article</type><title>Deletion and Insertion Tests in Regression Models</title><source>Free E- Journals</source><creator>Hama, Naofumi ; Mase, Masayoshi ; Owen, Art B</creator><creatorcontrib>Hama, Naofumi ; Mase, Masayoshi ; Owen, Art B</creatorcontrib><description>A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function \(f\). The insertion and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of \(f\). We find an expression for the expected value of the AUC under a random ordering of inputs to \(f\) and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input\(\times\)gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Computation ; Deletion ; Insertion ; Interpolation ; Regression models ; Straight lines ; Variables</subject><ispartof>arXiv.org, 2023-08</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Hama, Naofumi</creatorcontrib><creatorcontrib>Mase, Masayoshi</creatorcontrib><creatorcontrib>Owen, Art B</creatorcontrib><title>Deletion and Insertion Tests in Regression Models</title><title>arXiv.org</title><description>A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function \(f\). The insertion and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of \(f\). We find an expression for the expected value of the AUC under a random ordering of inputs to \(f\) and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input\(\times\)gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression.</description><subject>Algorithms</subject><subject>Computation</subject><subject>Deletion</subject><subject>Insertion</subject><subject>Interpolation</subject><subject>Regression models</subject><subject>Straight lines</subject><subject>Variables</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwdEnNSS3JzM9TSMxLUfDMK04tAvNCUotLihUy8xSCUtOLUouLQWK--SmpOcU8DKxpiTnFqbxQmptB2c01xNlDt6Aov7AUqC0-K7-0KA8oFW9kZmZpbm5hbGliTJwqAJg7M4E</recordid><startdate>20230823</startdate><enddate>20230823</enddate><creator>Hama, Naofumi</creator><creator>Mase, Masayoshi</creator><creator>Owen, Art B</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230823</creationdate><title>Deletion and Insertion Tests in Regression Models</title><author>Hama, Naofumi ; Mase, Masayoshi ; Owen, Art B</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26697783943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Computation</topic><topic>Deletion</topic><topic>Insertion</topic><topic>Interpolation</topic><topic>Regression models</topic><topic>Straight lines</topic><topic>Variables</topic><toplevel>online_resources</toplevel><creatorcontrib>Hama, Naofumi</creatorcontrib><creatorcontrib>Mase, Masayoshi</creatorcontrib><creatorcontrib>Owen, Art B</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hama, Naofumi</au><au>Mase, Masayoshi</au><au>Owen, Art B</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Deletion and Insertion Tests in Regression Models</atitle><jtitle>arXiv.org</jtitle><date>2023-08-23</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function \(f\). The insertion and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of algorithms that rank pixels from most to least important for a classification. Motivated by regression problems we establish a formula for their area under the curve (AUC) criteria in terms of certain main effects and interactions in an anchored decomposition of \(f\). We find an expression for the expected value of the AUC under a random ordering of inputs to \(f\) and propose an alternative area above a straight line for the regression setting. We use this criterion to compare feature importances computed by integrated gradients (IG) to those computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and input\(\times\)gradient methods. KS has the best overall performance in two datasets we consider but it is very expensive to compute. We find that IG is nearly as good as KS while being much faster. Our comparison problems include some binary inputs that pose a challenge to IG because it must use values between the possible variable levels and so we consider ways to handle binary variables in IG. We show that sorting variables by their Shapley value does not necessarily give the optimal ordering for an insertion-deletion test. It will however do that for monotone functions of additive models, such as logistic regression.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-08
issn 2331-8422
language eng
recordid cdi_proquest_journals_2669778394
source Free E- Journals
subjects Algorithms
Computation
Deletion
Insertion
Interpolation
Regression models
Straight lines
Variables
title Deletion and Insertion Tests in Regression Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T11%3A22%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Deletion%20and%20Insertion%20Tests%20in%20Regression%20Models&rft.jtitle=arXiv.org&rft.au=Hama,%20Naofumi&rft.date=2023-08-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2669778394%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2669778394&rft_id=info:pmid/&rfr_iscdi=true