Interpreting Undesirable Pixels for Image Classification on Black-Box Models
In an effort to interpret black-box models, researches for developing explanation methods have proceeded in recent years. Most studies have tried to identify input pixels that are crucial to the prediction of a classifier. While this approach is meaningful to analyse the characteristic of blackbox m...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In an effort to interpret black-box models, researches for developing
explanation methods have proceeded in recent years. Most studies have tried to
identify input pixels that are crucial to the prediction of a classifier. While
this approach is meaningful to analyse the characteristic of blackbox models,
it is also important to investigate pixels that interfere with the prediction.
To tackle this issue, in this paper, we propose an explanation method that
visualizes undesirable regions to classify an image as a target class. To be
specific, we divide the concept of undesirable regions into two terms: (1)
factors for a target class, which hinder that black-box models identify
intrinsic characteristics of a target class and (2) factors for non-target
classes that are important regions for an image to be classified as other
classes. We visualize such undesirable regions on heatmaps to qualitatively
validate the proposed method. Furthermore, we present an evaluation metric to
provide quantitative results on ImageNet. |
---|---|
DOI: | 10.48550/arxiv.1909.12446 |