StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text
The effective utilization of structured data, integral to corporate data strategies, has been challenged by the rise of large language models (LLMs) capable of processing unstructured information. This shift prompts the question: can LLMs interpret structured data directly in its unstructured form?...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The effective utilization of structured data, integral to corporate data
strategies, has been challenged by the rise of large language models (LLMs)
capable of processing unstructured information. This shift prompts the
question: can LLMs interpret structured data directly in its unstructured form?
We propose an automatic evaluation data generation method for assessing LLMs'
reasoning capabilities on structure-rich text to explore this. Our approach
supports 8 structured languages and 29 tasks, generating data with adjustable
complexity through controllable nesting and structural width. We introduce
StrucText-Eval, a benchmark containing 5,800 pre-generated and annotated
samples designed to evaluate how well LLMs understand and reason through
structured text. StrucText-Eval is divided into two suites: a regular Test
suite (3,712 samples) and a Test-Hard suite (2,088 samples), the latter
emphasizing the gap between human and model performance on more complex tasks.
Experimental results show that while open-source LLMs achieve a maximum
accuracy of 74.9\% on the standard dataset, their performance drops
significantly to 45.8\% on the harder dataset. In contrast, human participants
reach an accuracy of 92.6\% on StrucText-Eval-Hard, highlighting LLMs' current
limitations in handling intricate structural information. The benchmark and
generation codes are open sourced in
\url{https://github.com/MikeGu721/StrucText-Eval} |
---|---|
DOI: | 10.48550/arxiv.2406.10621 |