MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue
Automatic open-domain dialogue evaluation is a crucial component of dialogue systems. Recently, learning-based evaluation metrics have achieved state-of-the-art performance in open-domain dialogue evaluation. However, these metrics, which only focus on a few qualities, are hard to evaluate dialogue...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic open-domain dialogue evaluation is a crucial component of dialogue
systems. Recently, learning-based evaluation metrics have achieved
state-of-the-art performance in open-domain dialogue evaluation. However, these
metrics, which only focus on a few qualities, are hard to evaluate dialogue
comprehensively. Furthermore, these metrics lack an effective score composition
approach for diverse evaluation qualities. To address the above problems, we
propose a Multi-Metric Evaluation based on Correlation Re-Scaling (MME-CRS) for
evaluating open-domain dialogue. Firstly, we build an evaluation metric
composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation
(MME) to evaluate the quality of dialogue comprehensively. Furthermore, we
propose a novel score composition method called Correlation Re-Scaling (CRS) to
model the relationship between sub-metrics and diverse qualities. Our approach
MME-CRS ranks first on the final test data of DSTC10 track5 subtask1 Automatic
Open-domain Dialogue Evaluation Challenge with a large margin, which proved the
effectiveness of our proposed approach. |
---|---|
DOI: | 10.48550/arxiv.2206.09403 |