With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness
Conditional language models still generate unfaithful output that is not supported by their input. These unfaithful generations jeopardize trust in real-world applications such as summarization or human-machine interaction, motivating a need for automatic faithfulness metrics. To implement such metr...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conditional language models still generate unfaithful output that is not
supported by their input. These unfaithful generations jeopardize trust in
real-world applications such as summarization or human-machine interaction,
motivating a need for automatic faithfulness metrics. To implement such
metrics, NLI models seem attractive, since they solve a strongly related task
that comes with a wealth of prior research and data. But recent research
suggests that NLI models require costly additional machinery to perform
reliably across datasets, e.g., by running inference on a cartesian product of
input and generated sentences, or supporting them with a
question-generation/answering step.
In this work we show that pure NLI models _can_ outperform more complex
metrics when combining task-adaptive data augmentation with robust inference
procedures. We propose: (1) Augmenting NLI training data to adapt NL inferences
to the specificities of faithfulness prediction in dialogue; (2) Making use of
both entailment and contradiction probabilities in NLI, and (3) Using
Monte-Carlo dropout during inference. Applied to the TRUE benchmark, which
combines faithfulness datasets across diverse domains and tasks, our approach
strongly improves a vanilla NLI model and significantly outperforms previous
work, while showing favourable computational cost. |
---|---|
DOI: | 10.48550/arxiv.2305.16819 |