Return of EM: Entity-driven Answer Set Expansion for QA Evaluation
Recently, directly using large language models (LLMs) has been shown to be the most reliable method to evaluate QA models. However, it suffers from limited interpretability, high cost, and environmental harm. To address these, we propose to use soft EM with entity-driven answer set expansion. Our ap...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, directly using large language models (LLMs) has been shown to be
the most reliable method to evaluate QA models. However, it suffers from
limited interpretability, high cost, and environmental harm. To address these,
we propose to use soft EM with entity-driven answer set expansion. Our approach
expands the gold answer set to include diverse surface forms, based on the
observation that the surface forms often follow particular patterns depending
on the entity type. The experimental results show that our method outperforms
traditional evaluation methods by a large margin. Moreover, the reliability of
our evaluation method is comparable to that of LLM-based ones, while offering
the benefits of high interpretability and reduced environmental harm. |
---|---|
DOI: | 10.48550/arxiv.2404.15650 |