Provably Learning Object-Centric Representations
Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical progress, a theoretical account of when unsupervised object-c...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning structured representations of the visual world in terms of objects
promises to significantly improve the generalization abilities of current
machine learning models. While recent efforts to this end have shown promising
empirical progress, a theoretical account of when unsupervised object-centric
representation learning is possible is still lacking. Consequently,
understanding the reasons for the success of existing object-centric methods as
well as designing new theoretically grounded methods remains challenging. In
the present work, we analyze when object-centric representations can provably
be learned without supervision. To this end, we first introduce two assumptions
on the generative process for scenes comprised of several objects, which we
call compositionality and irreducibility. Under this generative process, we
prove that the ground-truth object representations can be identified by an
invertible and compositional inference model, even in the presence of
dependencies between objects. We empirically validate our results through
experiments on synthetic data. Finally, we provide evidence that our theory
holds predictive power for existing object-centric models by showing a close
correspondence between models' compositionality and invertibility and their
empirical identifiability. |
---|---|
DOI: | 10.48550/arxiv.2305.14229 |