Pix2Code: Learning to Compose Neural Visual Concepts as Programs
The challenge in learning abstract concepts from images in an unsupervised fashion lies in the required integration of visual perception and generalizable relational reasoning. Moreover, the unsupervised nature of this task makes it necessary for human users to be able to understand a model's l...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The challenge in learning abstract concepts from images in an unsupervised
fashion lies in the required integration of visual perception and generalizable
relational reasoning. Moreover, the unsupervised nature of this task makes it
necessary for human users to be able to understand a model's learnt concepts
and potentially revise false behaviours. To tackle both the generalizability
and interpretability constraints of visual concept learning, we propose
Pix2Code, a framework that extends program synthesis to visual relational
reasoning by utilizing the abilities of both explicit, compositional symbolic
and implicit neural representations. This is achieved by retrieving object
representations from images and synthesizing relational concepts as
lambda-calculus programs. We evaluate the diverse properties of Pix2Code on the
challenging reasoning domains, Kandinsky Patterns and CURI, thereby testing its
ability to identify compositional visual concepts that generalize to novel data
and concept configurations. Particularly, in stark contrast to neural
approaches, we show that Pix2Code's representations remain human interpretable
and can be easily revised for improved performance. |
---|---|
DOI: | 10.48550/arxiv.2402.08280 |