When stakes are high: Balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates
Technological advancements allow to develop high-performance black box predictive models. However, strictly regulated industries (like banking and insurance) ask for transparent decision-making algorithms. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRoga...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2022-09, Vol.202, p.117230, Article 117230 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Technological advancements allow to develop high-performance black box predictive models. However, strictly regulated industries (like banking and insurance) ask for transparent decision-making algorithms. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited for structured tabular data. Knowledge is extracted from a black box via partial dependence effects. These are used to perform smart feature engineering by grouping variable values. This results in a segmentation of the feature space with automatic variable selection. A transparent generalized linear model (GLM) is fit to the features in categorical format and their relevant interactions. This GLM serves as a global surrogate to the original black box and replaces it in production. We demonstrate our R package maidrr with a case study on general insurance claim frequency modeling for six publicly available datasets. Our maidrr GLM closely approximates a gradient boosting machine (GBM) black box and outperforms both a linear and tree surrogate as benchmarks.
•Procedure to develop an interpretable global surrogate for a complex system.•Surrogate closely approximates a black box model regarding accuracy and fidelity.•Automatic feature selection, segmentation and both global and local explanations.•Satisfy transparency needs of a strictly regulated industry or high-stakes decision.•Case study on insurance claim frequency prediction for six public datasets. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2022.117230 |