Disentangling Patterns and Transformations from One Sequence of Images with Shape-invariant Lie Group Transformer
An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is extremely difficult to equip robots with such a learning mechanism....
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | An effective way to model the complex real world is to view the world as a
composition of basic components of objects and transformations. Although humans
through development understand the compositionality of the real world, it is
extremely difficult to equip robots with such a learning mechanism. In recent
years, there has been significant research on autonomously learning
representations of the world using the deep learning; however, most studies
have taken a statistical approach, which requires a large number of training
data. Contrary to such existing methods, we take a novel algebraic approach for
representation learning based on a simpler and more intuitive formulation that
the observed world is the combination of multiple independent patterns and
transformations that are invariant to the shape of patterns. Since the shape of
patterns can be viewed as the invariant features against symmetric
transformations such as translation or rotation, we can expect that the
patterns can naturally be extracted by expressing transformations with
symmetric Lie group transformers and attempting to reconstruct the scene with
them. Based on this idea, we propose a model that disentangles the scenes into
the minimum number of basic components of patterns and Lie transformations from
only one sequence of images, by introducing the learnable shape-invariant Lie
group transformers as transformation components. Experiments show that given
one sequence of images in which two objects are moving independently, the
proposed model can discover the hidden distinct objects and multiple
shape-invariant transformations that constitute the scenes. |
---|---|
DOI: | 10.48550/arxiv.2203.11210 |