Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using the Abstraction and Reasoning Corpus (ARC) benchmark to eval...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lee, Seungpil, Sim, Woochang, Shin, Donghyeon, Seo, Wongyu, Park, Jiwon, Lee, Seokki, Hwang, Sanha, Kim, Sejin, Kim, Sundong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using the Abstraction and Reasoning Corpus (ARC) benchmark to evaluate the inference and contextual understanding abilities of LLMs in a process-centric manner, focusing on three key components from the Language of Thought Hypothesis (LoTH): Logical Coherence, Compositionality, and Productivity. Our carefully designed experiments reveal that while LLMs demonstrate some inference capabilities, they still significantly lag behind human-level reasoning in these three aspects. The main contribution of this paper lies in introducing the LoTH perspective, which provides a method for evaluating the reasoning process that conventional results-oriented approaches fail to capture, thereby offering new insights into the development of human-level reasoning in artificial intelligence systems.
DOI:10.48550/arxiv.2403.11793