When stakes are high: Balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates

Technological advancements allow to develop high-performance black box predictive models. However, strictly regulated industries (like banking and insurance) ask for transparent decision-making algorithms. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRoga...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2022-09, Vol.202, p.117230, Article 117230
Hauptverfasser:	Henckaerts, Roel, Antonio, Katrien, Côté, Marie-Pier
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Decision making Feature selection Generalized linear models GLM Global surrogate Insurance Performance prediction Prediction models Regulated industries Segmentation Statistical models Tables (data) XAI
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Technological advancements allow to develop high-performance black box predictive models. However, strictly regulated industries (like banking and insurance) ask for transparent decision-making algorithms. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited for structured tabular data. Knowledge is extracted from a black box via partial dependence effects. These are used to perform smart feature engineering by grouping variable values. This results in a segmentation of the feature space with automatic variable selection. A transparent generalized linear model (GLM) is fit to the features in categorical format and their relevant interactions. This GLM serves as a global surrogate to the original black box and replaces it in production. We demonstrate our R package maidrr with a case study on general insurance claim frequency modeling for six publicly available datasets. Our maidrr GLM closely approximates a gradient boosting machine (GBM) black box and outperforms both a linear and tree surrogate as benchmarks. •Procedure to develop an interpretable global surrogate for a complex system.•Surrogate closely approximates a black box model regarding accuracy and fidelity.•Automatic feature selection, segmentation and both global and local explanations.•Satisfy transparency needs of a strictly regulated industry or high-stakes decision.•Case study on insurance claim frequency prediction for six public datasets.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.117230