RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors
Many commercial and open-source models claim to detect machine-generated text with extremely high accuracy (99% or more). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging-lacking va...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many commercial and open-source models claim to detect machine-generated text
with extremely high accuracy (99% or more). However, very few of these
detectors are evaluated on shared benchmark datasets and even when they are,
the datasets used for evaluation are insufficiently challenging-lacking
variations in sampling strategy, adversarial attacks, and open-source
generative models. In this work we present RAID: the largest and most
challenging benchmark dataset for machine-generated text detection. RAID
includes over 6 million generations spanning 11 models, 8 domains, 11
adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the
out-of-domain and adversarial robustness of 8 open- and 4 closed-source
detectors and find that current detectors are easily fooled by adversarial
attacks, variations in sampling strategies, repetition penalties, and unseen
generative models. We release our data along with a leaderboard to encourage
future research. |
---|---|
DOI: | 10.48550/arxiv.2405.07940 |