Explaining by Removing: A Unified Framework for Model Explanation

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature rem...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of machine learning research 2021-01, Vol.22
Hauptverfasser:	Covert, Ian C., Lundberg, Scott, Lee, Su-In
Format:	Artikel
Sprache:	eng
Schlagworte:	Automation & Control Systems Computer Science Computer Science, Artificial Intelligence Science & Technology Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	Journal of machine learning research
container_volume	22
creator	Covert, Ian C. Lundberg, Scott Lee, Su-In
description	Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.
format	Article
fullrecord	<record><control><sourceid>webofscience</sourceid><recordid>TN_cdi_webofscience_primary_000706446200001</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>000706446200001</sourcerecordid><originalsourceid>FETCH-LOGICAL-s155t-84e71ea0aee384048cd136542214d0971379828cff6bf59c1861e1d61ceb650e3</originalsourceid><addsrcrecordid>eNqNj11LwzAYhXOh4Jz-h9xLIW--mnpXyqbChiDueqTJG4m2yWirc__eMv0BXp3n4jkHzgVZgBK8kFKoK3I9ju-MQam4XpB69X3obEwxvdH2RF-wz18z39Oa7lIMET1dD7bHYx4-aMgD3WaPHT23kp1iTjfkMthuxNu_XJLdevXaPBab54enpt4UIyg1FUZiCWiZRRRGMmmcB6GV5BykZ1UJoqwMNy4E3QZVOTAaELwGh61WDMWS3P3uHrHNYXQRk8P9YYi9HU57xljJtJSaz8Rgts3_7SZO5ytN_kyT-AFJjViA</addsrcrecordid><sourcetype>Enrichment Source</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Explaining by Removing: A Unified Framework for Model Explanation</title><source>Access via ACM Digital Library</source><source>Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><source>EZB-FREE-00999 freely available EZB journals</source><source>Web of Science - Social Sciences Citation Index – 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><creator>Covert, Ian C. ; Lundberg, Scott ; Lee, Su-In</creator><creatorcontrib>Covert, Ian C. ; Lundberg, Scott ; Lee, Su-In</creatorcontrib><description>Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.</description><identifier>ISSN: 1532-4435</identifier><language>eng</language><publisher>BROOKLINE: Microtome Publ</publisher><subject>Automation & Control Systems ; Computer Science ; Computer Science, Artificial Intelligence ; Science & Technology ; Technology</subject><ispartof>Journal of machine learning research, 2021-01, Vol.22</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>8</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000706446200001</woscitedreferencesoriginalsourcerecordid><cites>FETCH-LOGICAL-s155t-84e71ea0aee384048cd136542214d0971379828cff6bf59c1861e1d61ceb650e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,39262,39263</link.rule.ids></links><search><creatorcontrib>Covert, Ian C.</creatorcontrib><creatorcontrib>Lundberg, Scott</creatorcontrib><creatorcontrib>Lee, Su-In</creatorcontrib><title>Explaining by Removing: A Unified Framework for Model Explanation</title><title>Journal of machine learning research</title><addtitle>J MACH LEARN RES</addtitle><description>Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.</description><subject>Automation & Control Systems</subject><subject>Computer Science</subject><subject>Computer Science, Artificial Intelligence</subject><subject>Science & Technology</subject><subject>Technology</subject><issn>1532-4435</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GIZIO</sourceid><sourceid>HGBXW</sourceid><recordid>eNqNj11LwzAYhXOh4Jz-h9xLIW--mnpXyqbChiDueqTJG4m2yWirc__eMv0BXp3n4jkHzgVZgBK8kFKoK3I9ju-MQam4XpB69X3obEwxvdH2RF-wz18z39Oa7lIMET1dD7bHYx4-aMgD3WaPHT23kp1iTjfkMthuxNu_XJLdevXaPBab54enpt4UIyg1FUZiCWiZRRRGMmmcB6GV5BykZ1UJoqwMNy4E3QZVOTAaELwGh61WDMWS3P3uHrHNYXQRk8P9YYi9HU57xljJtJSaz8Rgts3_7SZO5ytN_kyT-AFJjViA</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Covert, Ian C.</creator><creator>Lundberg, Scott</creator><creator>Lee, Su-In</creator><general>Microtome Publ</general><scope>17B</scope><scope>BLEPL</scope><scope>DTL</scope><scope>DVR</scope><scope>EGQ</scope><scope>GIZIO</scope><scope>HGBXW</scope></search><sort><creationdate>20210101</creationdate><title>Explaining by Removing: A Unified Framework for Model Explanation</title><author>Covert, Ian C. ; Lundberg, Scott ; Lee, Su-In</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-s155t-84e71ea0aee384048cd136542214d0971379828cff6bf59c1861e1d61ceb650e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Automation & Control Systems</topic><topic>Computer Science</topic><topic>Computer Science, Artificial Intelligence</topic><topic>Science & Technology</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Covert, Ian C.</creatorcontrib><creatorcontrib>Lundberg, Scott</creatorcontrib><creatorcontrib>Lee, Su-In</creatorcontrib><collection>Web of Knowledge</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>Social Sciences Citation Index</collection><collection>Web of Science Primary (SCIE, SSCI & AHCI)</collection><collection>Web of Science - Social Sciences Citation Index – 2021</collection><collection>Web of Science - Science Citation Index Expanded - 2021</collection><jtitle>Journal of machine learning research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Covert, Ian C.</au><au>Lundberg, Scott</au><au>Lee, Su-In</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Explaining by Removing: A Unified Framework for Model Explanation</atitle><jtitle>Journal of machine learning research</jtitle><stitle>J MACH LEARN RES</stitle><date>2021-01-01</date><risdate>2021</risdate><volume>22</volume><issn>1532-4435</issn><abstract>Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.</abstract><cop>BROOKLINE</cop><pub>Microtome Publ</pub><tpages>90</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1532-4435
ispartof	Journal of machine learning research, 2021-01, Vol.22
issn	1532-4435
language	eng
recordid	cdi_webofscience_primary_000706446200001
source	Access via ACM Digital Library; Web of Science - Science Citation Index Expanded - 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />; EZB-FREE-00999 freely available EZB journals; Web of Science - Social Sciences Citation Index – 2021<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />
subjects	Automation & Control Systems Computer Science Computer Science, Artificial Intelligence Science & Technology Technology
title	Explaining by Removing: A Unified Framework for Model Explanation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-11T11%3A50%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-webofscience&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Explaining%20by%20Removing:%20A%20Unified%20Framework%20for%20Model%20Explanation&rft.jtitle=Journal%20of%20machine%20learning%20research&rft.au=Covert,%20Ian%20C.&rft.date=2021-01-01&rft.volume=22&rft.issn=1532-4435&rft_id=info:doi/&rft_dat=%3Cwebofscience%3E000706446200001%3C/webofscience%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true