Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use of automatic metrics, focusing particularly on na...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic metrics are extensively used to evaluate natural language
processing systems. However, there has been increasing focus on how they are
used and reported by practitioners within the field. In this paper, we have
conducted a survey on the use of automatic metrics, focusing particularly on
natural language generation (NLG) tasks. We inspect which metrics are used as
well as why they are chosen and how their use is reported. Our findings from
this survey reveal significant shortcomings, including inappropriate metric
usage, lack of implementation details and missing correlations with human
judgements. We conclude with recommendations that we believe authors should
follow to enable more rigour within the field. |
---|---|
DOI: | 10.48550/arxiv.2408.09169 |