Data Checklist: On Unit-Testing Datasets with Usable Information
Model checklists (Ribeiro et al., 2020) have emerged as a useful tool for understanding the behavior of LLMs, analogous to unit-testing in software engineering. However, despite datasets being a key determinant of model behavior, evaluating datasets, e.g., for the existence of annotation artifacts,...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Model checklists (Ribeiro et al., 2020) have emerged as a useful tool for
understanding the behavior of LLMs, analogous to unit-testing in software
engineering. However, despite datasets being a key determinant of model
behavior, evaluating datasets, e.g., for the existence of annotation artifacts,
is largely done ad hoc, once a problem in model behavior has already been found
downstream. In this work, we take a more principled approach to unit-testing
datasets by proposing a taxonomy based on the V-information literature. We call
a collection of such unit tests a data checklist. Using a checklist, not only
are we able to recover known artifacts in well-known datasets such as SNLI, but
we also discover previously unknown artifacts in preference datasets for LLM
alignment. Data checklists further enable a new kind of data filtering, which
we use to improve the efficacy and data efficiency of preference alignment. |
---|---|
DOI: | 10.48550/arxiv.2408.02919 |