Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data
AAAI 2024 Instruction-following is particularly crucial for large language models (LLMs) to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and d...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | AAAI 2024 Instruction-following is particularly crucial for large language models
(LLMs) to support diverse user requests. While existing work has made progress
in aligning LLMs with human preferences, evaluating their capabilities on
instruction following remains a challenge due to complexity and diversity of
real-world user instructions. While existing evaluation methods focus on
general skills, they suffer from two main shortcomings, i.e., lack of
fine-grained task-level evaluation and reliance on singular instruction
expression. To address these problems, this paper introduces DINGO, a
fine-grained and diverse instruction-following evaluation dataset that has two
main advantages: (1) DINGO is based on a manual annotated, fine-grained and
multi-level category tree with 130 nodes derived from real-world user requests;
(2) DINGO includes diverse instructions, generated by both GPT-4 and human
experts. Through extensive experiments, we demonstrate that DINGO can not only
provide more challenging and comprehensive evaluation for LLMs, but also
provide task-level fine-grained directions to further improve LLMs. |
---|---|
DOI: | 10.48550/arxiv.2407.03942 |