Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus
The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using the Abstraction and Reasoning Corpus (ARC) benchmark to eval...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The existing methods for evaluating the inference abilities of Large Language
Models (LLMs) have been predominantly results-centric, making it challenging to
assess the inference process comprehensively. We introduce a novel approach
using the Abstraction and Reasoning Corpus (ARC) benchmark to evaluate the
inference and contextual understanding abilities of LLMs in a process-centric
manner, focusing on three key components from the Language of Thought
Hypothesis (LoTH): Logical Coherence, Compositionality, and Productivity. Our
carefully designed experiments reveal that while LLMs demonstrate some
inference capabilities, they still significantly lag behind human-level
reasoning in these three aspects. The main contribution of this paper lies in
introducing the LoTH perspective, which provides a method for evaluating the
reasoning process that conventional results-oriented approaches fail to
capture, thereby offering new insights into the development of human-level
reasoning in artificial intelligence systems. |
---|---|
DOI: | 10.48550/arxiv.2403.11793 |