Recursive Visual Programming
Visual Programming (VP) has emerged as a powerful framework for Visual Question Answering (VQA). By generating and executing bespoke code for each question, these methods demonstrate impressive compositional and reasoning capabilities, especially in few-shot and zero-shot scenarios. However, existin...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Visual Programming (VP) has emerged as a powerful framework for Visual
Question Answering (VQA). By generating and executing bespoke code for each
question, these methods demonstrate impressive compositional and reasoning
capabilities, especially in few-shot and zero-shot scenarios. However, existing
VP methods generate all code in a single function, resulting in code that is
suboptimal in terms of both accuracy and interpretability. Inspired by human
coding practices, we propose Recursive Visual Programming (RVP), which
simplifies generated routines, provides more efficient problem solving, and can
manage more complex data structures. RVP is inspired by human coding practices
and approaches VQA tasks with an iterative recursive code generation approach,
allowing decomposition of complicated problems into smaller parts. Notably, RVP
is capable of dynamic type assignment, i.e., as the system recursively
generates a new piece of code, it autonomously determines the appropriate
return type and crafts the requisite code to generate that output. We show
RVP's efficacy through extensive experiments on benchmarks including VSR, COVR,
GQA, and NextQA, underscoring the value of adopting human-like recursive and
modular programming techniques for solving VQA tasks through coding. |
---|---|
DOI: | 10.48550/arxiv.2312.02249 |