SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
Recently, interpreting complex charts with logical reasoning has emerged as challenges due to the development of vision-language models. A prior state-of-the-art (SOTA) model has presented an end-to-end method that leverages the vision-language model to convert charts into table format utilizing Lar...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, interpreting complex charts with logical reasoning has emerged as
challenges due to the development of vision-language models. A prior
state-of-the-art (SOTA) model has presented an end-to-end method that leverages
the vision-language model to convert charts into table format utilizing Large
Language Model (LLM) for reasoning. However, unlike natural images, charts
contain a mix of essential and irrelevant information required for chart
reasoning, and we discover that this characteristic can lower the performance
of chart-to-table extraction. In this paper, we introduce SIMPLOT, a method
designed to extract only the elements necessary for chart reasoning. The
proposed method involves two steps: 1) training to mimic a simple plot that
contains only the essential information from a complex chart for table
extraction, followed by 2) performing reasoning based on the table. Our model
enables accurate chart reasoning without the need for additional annotations or
datasets, and its effectiveness is demonstrated through various experiments.
Furthermore, we propose a novel prompt mimicking how human interpret charts for
more accurate reasoning. Our source code is available at
https://github.com/sangwu99/Simplot. |
---|---|
DOI: | 10.48550/arxiv.2405.00021 |