Disentangling Patterns and Transformations from One Sequence of Images with Shape-invariant Lie Group Transformer

An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is extremely difficult to equip robots with such a learning mechanism....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Takada, T, Shimaya, W, Ohmura, Y, Kuniyoshi, Y
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is extremely difficult to equip robots with such a learning mechanism. In recent years, there has been significant research on autonomously learning representations of the world using the deep learning; however, most studies have taken a statistical approach, which requires a large number of training data. Contrary to such existing methods, we take a novel algebraic approach for representation learning based on a simpler and more intuitive formulation that the observed world is the combination of multiple independent patterns and transformations that are invariant to the shape of patterns. Since the shape of patterns can be viewed as the invariant features against symmetric transformations such as translation or rotation, we can expect that the patterns can naturally be extracted by expressing transformations with symmetric Lie group transformers and attempting to reconstruct the scene with them. Based on this idea, we propose a model that disentangles the scenes into the minimum number of basic components of patterns and Lie transformations from only one sequence of images, by introducing the learnable shape-invariant Lie group transformers as transformation components. Experiments show that given one sequence of images in which two objects are moving independently, the proposed model can discover the hidden distinct objects and multiple shape-invariant transformations that constitute the scenes.
DOI:10.48550/arxiv.2203.11210