Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models
Recent advancements in generative AI systems have raised concerns about academic integrity among educators. Beyond excelling at solving programming problems and text-based multiple-choice questions, recent research has also found that large multimodal models (LMMs) can solve Parsons problems based o...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advancements in generative AI systems have raised concerns about
academic integrity among educators. Beyond excelling at solving programming
problems and text-based multiple-choice questions, recent research has also
found that large multimodal models (LMMs) can solve Parsons problems based only
on an image. However, such problems are still inherently text-based and rely on
the capabilities of the models to convert the images of code blocks to their
corresponding text. In this paper, we further investigate the capabilities of
LMMs to solve graph and tree data structure problems based only on images. To
achieve this, we computationally construct and evaluate a novel benchmark
dataset comprising 9,072 samples of diverse graph and tree data structure tasks
to assess the performance of the GPT-4o, GPT-4v, Gemini 1.5 Pro, Gemini 1.5
Flash, Gemini 1.0 Pro Vision, and Claude 3 model families. GPT-4o and Gemini
1.5 Flash performed best on trees and graphs respectively. GPT-4o achieved
87.6% accuracy on tree samples, while Gemini 1.5 Flash, achieved 56.2% accuracy
on graph samples. Our findings highlight the influence of structural and visual
variations on model performance. This research not only introduces an LMM
benchmark to facilitate replication and further exploration but also
underscores the potential of LMMs in solving complex computing problems, with
important implications for pedagogy and assessment practices. |
---|---|
DOI: | 10.48550/arxiv.2412.11088 |