GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives

2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics Human annotation plays a core role in machine learning -- annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement learning, to cite a few ave...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Prabhakaran, Vinodkumar, Homan, Christopher, Aroyo, Lora, Davani, Aida Mostafazadeh, Parrish, Alicia, Taylor, Alex, Díaz, Mark, Wang, Ding, Serapio-García, Gregory
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics Human annotation plays a core role in machine learning -- annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement learning, to cite a few avenues. However, the fact that many of these human annotations are inherently subjective is often overlooked. Recent work has demonstrated that ignoring rater subjectivity (typically resulting in rater disagreement) is problematic within specific tasks and for specific subgroups. Generalizable methods to harness rater disagreement and thus understand the socio-cultural leanings of subjective tasks remain elusive. In this paper, we propose GRASP, a comprehensive disagreement analysis framework to measure group association in perspectives among different rater sub-groups, and demonstrate its utility in assessing the extent of systematic disagreements in two datasets: (1) safety annotations of human-chatbot conversations, and (2) offensiveness annotations of social media posts, both annotated by diverse rater pools across different socio-demographic axes. Our framework (based on disagreement metrics) reveals specific rater groups that have significantly different perspectives than others on certain tasks, and helps identify demographic axes that are crucial to consider in specific task contexts.
DOI:10.48550/arxiv.2311.05074