Real or Fake? Learning to Discriminate Machine from Human Generated Text
Energy-based models (EBMs), a.k.a. un-normalized models, have had recent successes in continuous spaces. However, they have not been successfully applied to model text sequences. While decreasing the energy at training samples is straightforward, mining (negative) samples where the energy should be...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Energy-based models (EBMs), a.k.a. un-normalized models, have had recent
successes in continuous spaces. However, they have not been successfully
applied to model text sequences. While decreasing the energy at training
samples is straightforward, mining (negative) samples where the energy should
be increased is difficult. In part, this is because standard gradient-based
methods are not readily applicable when the input is high-dimensional and
discrete. Here, we side-step this issue by generating negatives using
pre-trained auto-regressive language models. The EBM then works in the residual
of the language model; and is trained to discriminate real text from text
generated by the auto-regressive models. We investigate the generalization
ability of residual EBMs, a pre-requisite for using them in other applications.
We extensively analyze generalization for the task of classifying whether an
input is machine or human generated, a natural task given the training loss and
how we mine negatives. Overall, we observe that EBMs can generalize remarkably
well to changes in the architecture of the generators producing negatives.
However, EBMs exhibit more sensitivity to the training set used by such
generators. |
---|---|
DOI: | 10.48550/arxiv.1906.03351 |