Emergence and Function of Abstract Representations in Self-Supervised Transformers
Human intelligence relies in part on our brains' ability to create abstract mental models that succinctly capture the hidden blueprint of our reality. Such abstract world models notably allow us to rapidly navigate novel situations by generalizing prior knowledge, a trait deep learning systems...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Human intelligence relies in part on our brains' ability to create abstract
mental models that succinctly capture the hidden blueprint of our reality. Such
abstract world models notably allow us to rapidly navigate novel situations by
generalizing prior knowledge, a trait deep learning systems have historically
struggled to replicate. However, the recent shift from supervised to
self-supervised objectives, combined with expressive transformer-based
architectures, have yielded powerful foundation models that appear to learn
versatile representations that can support a wide range of downstream tasks.
This promising development raises the intriguing possibility of such models
developing in silico abstract world models. We test this hypothesis by studying
the inner workings of small-scale transformers trained to reconstruct partially
masked visual scenes generated from a simple blueprint. We show that the
network develops intermediate abstract representations, or abstractions, that
encode all semantic features of the dataset. These abstractions manifest as
low-dimensional manifolds where the embeddings of semantically related tokens
transiently converge, thus allowing for the generalization of downstream
computations. Using precise manipulation experiments, we demonstrate that
abstractions are central to the network's decision-making process. Our research
also suggests that these abstractions are compositionally structured,
exhibiting features like contextual independence and part-whole relationships
that mirror the compositional nature of the dataset. Finally, we introduce a
Language-Enhanced Architecture (LEA) designed to encourage the network to
articulate its computations. We find that LEA develops an abstraction-centric
language that can be easily interpreted, allowing us to more readily access and
steer the network's decision-making process. |
---|---|
DOI: | 10.48550/arxiv.2312.05361 |