Microscopic Analysis on LLM players via Social Deduction Game
Recent studies have begun developing autonomous game players for social deduction games using large language models (LLMs). When building LLM players, fine-grained evaluations are crucial for addressing weaknesses in game-playing abilities. However, existing studies have often overlooked such assess...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent studies have begun developing autonomous game players for social
deduction games using large language models (LLMs). When building LLM players,
fine-grained evaluations are crucial for addressing weaknesses in game-playing
abilities. However, existing studies have often overlooked such assessments.
Specifically, we point out two issues with the evaluation methods employed.
First, game-playing abilities have typically been assessed through game-level
outcomes rather than specific event-level skills; Second, error analyses have
lacked structured methodologies. To address these issues, we propose an
approach utilizing a variant of the SpyFall game, named SpyGame. We conducted
an experiment with four LLMs, analyzing their gameplay behavior in SpyGame both
quantitatively and qualitatively. For the quantitative analysis, we introduced
eight metrics to resolve the first issue, revealing that these metrics are more
effective than existing ones for evaluating the two critical skills: intent
identification and camouflage. In the qualitative analysis, we performed
thematic analysis to resolve the second issue. This analysis identifies four
major categories that affect gameplay of LLMs. Additionally, we demonstrate how
these categories complement and support the findings from the quantitative
analysis. |
---|---|
DOI: | 10.48550/arxiv.2408.09946 |