Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
LLM-as-a-judge approaches are a practical and effective way of assessing a range of text tasks. However, when using pairwise comparisons to rank a set of candidates, the computational cost scales quadratically with the number of candidates, which has practical limitations. This paper introduces a Pr...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | LLM-as-a-judge approaches are a practical and effective way of assessing a
range of text tasks. However, when using pairwise comparisons to rank a set of
candidates, the computational cost scales quadratically with the number of
candidates, which has practical limitations. This paper introduces a Product of
Expert (PoE) framework for efficient LLM Comparative Assessment. Here
individual comparisons are considered experts that provide information on a
pair's score difference. The PoE framework combines the information from these
experts to yield an expression that can be maximized with respect to the
underlying set of candidates, and is highly flexible where any form of expert
can be assumed. When Gaussian experts are used one can derive simple
closed-form solutions for the optimal candidate ranking, and expressions for
selecting which comparisons should be made to maximize the probability of this
ranking. Our approach enables efficient comparative assessment, where by using
only a small subset of the possible comparisons, one can generate score
predictions that correlate well with human judgements. We evaluate the approach
on multiple NLG tasks and demonstrate that our framework can yield considerable
computational savings when performing pairwise comparative assessment. With
many candidate texts, using as few as 2% of comparisons the PoE solution can
achieve similar performance to when all comparisons are used. |
---|---|
DOI: | 10.48550/arxiv.2405.05894 |