Shapley Value on Probabilistic Classifiers
Data valuation has become an increasingly significant discipline in data science due to the economic value of data. In the context of machine learning (ML), data valuation methods aim to equitably measure the contribution of each data point to the utility of an ML model. One prevalent method is Shap...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data valuation has become an increasingly significant discipline in data
science due to the economic value of data. In the context of machine learning
(ML), data valuation methods aim to equitably measure the contribution of each
data point to the utility of an ML model. One prevalent method is Shapley
value, which helps identify data points that are beneficial or detrimental to
an ML model. However, traditional Shapley-based data valuation methods may not
effectively distinguish between beneficial and detrimental training data points
for probabilistic classifiers. In this paper, we propose Probabilistic Shapley
(P-Shapley) value by constructing a probability-wise utility function that
leverages the predicted class probabilities of probabilistic classifiers rather
than binarized prediction results in the traditional Shapley value. We also
offer several activation functions for confidence calibration to effectively
quantify the marginal contribution of each data point to the probabilistic
classifiers. Extensive experiments on four real-world datasets demonstrate the
effectiveness of our proposed P-Shapley value in evaluating the importance of
data for building a high-usability and trustworthy ML model. |
---|---|
DOI: | 10.48550/arxiv.2306.07171 |