Towards Human-Centred Explainability Benchmarks For Text Classification

Progress on many Natural Language Processing (NLP) tasks, such as text classification, is driven by objective, reproducible and scalable evaluation via publicly available benchmarks. However, these are not always representative of real-world scenarios where text classifiers are employed, such as sen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-11
Hauptverfasser:	Schlegel, Viktor, Mendez-Guzman, Erick, Batista-Navarro, Riza
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmarks Classification Classifiers Data mining Evaluation Natural language processing Sentiment analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Progress on many Natural Language Processing (NLP) tasks, such as text classification, is driven by objective, reproducible and scalable evaluation via publicly available benchmarks. However, these are not always representative of real-world scenarios where text classifiers are employed, such as sentiment analysis or misinformation detection. In this position paper, we put forward two points that aim to alleviate this problem. First, we propose to extend text classification benchmarks to evaluate the explainability of text classifiers. We review challenges associated with objectively evaluating the capabilities to produce valid explanations which leads us to the second main point: We propose to ground these benchmarks in human-centred applications, for example by using social media, gamification or to learn explainability metrics from human judgements.
ISSN:	2331-8422