Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing
Persona-driven role-playing (PRP) aims to build AI characters that can respond to user queries by faithfully sticking with all persona statements. Unfortunately, existing faithfulness criteria for PRP are limited to coarse-grained LLM-based scoring without a clear definition or formulation. This pap...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Persona-driven role-playing (PRP) aims to build AI characters that can
respond to user queries by faithfully sticking with all persona statements.
Unfortunately, existing faithfulness criteria for PRP are limited to
coarse-grained LLM-based scoring without a clear definition or formulation.
This paper presents a pioneering exploration to quantify PRP faithfulness as a
fine-grained and explainable criterion, which also serves as a reliable
reference for optimization. Our criterion first discriminates persona
statements into active and passive constraints by identifying the
query-statement relevance. Then, we incorporate all constraints following the
principle that the AI character's response should be (a) entailed by active
(relevant) constraints and (b) not contradicted by passive (irrelevant)
constraints. We translate this principle mathematically into a novel
Active-Passive-Constraint (APC) score, a constraint-wise sum of natural
language inference (NLI) scores weighted by relevance scores. In practice, we
build the APC scoring system by symbolically distilling small discriminators
from GPT-4 for efficiency. We validate the quality of the APC score against
human evaluation based on example personas with tens of statements, and the
results show a high correlation. We further leverage it as a reward system in
direct preference optimization (DPO) for better AI characters. Our experiments
offer a fine-grained and explainable comparison between existing PRP
techniques, revealing their advantages and limitations. We further find
APC-based DPO to be one of the most competitive techniques for sticking with
all constraints and can be well incorporated with other techniques. We then
extend the scale of the experiments to real persons with hundreds of statements
and reach a consistent conclusion. |
---|---|
DOI: | 10.48550/arxiv.2405.07726 |