Grounded Language Acquisition From Object and Action Imagery
Deep learning approaches to natural language processing have made great strides in recent years. While these models produce symbols that convey vast amounts of diverse knowledge, it is unclear how such symbols are grounded in data from the world. In this paper, we explore the development of a privat...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning approaches to natural language processing have made great
strides in recent years. While these models produce symbols that convey vast
amounts of diverse knowledge, it is unclear how such symbols are grounded in
data from the world. In this paper, we explore the development of a private
language for visual data representation by training emergent language (EL)
encoders/decoders in both i) a traditional referential game environment and ii)
a contrastive learning environment utilizing a within-class matching training
paradigm. An additional classification layer utilizing neural machine
translation and random forest classification was used to transform symbolic
representations (sequences of integer symbols) to class labels. These methods
were applied in two experiments focusing on object recognition and action
recognition. For object recognition, a set of sketches produced by human
participants from real imagery was used (Sketchy dataset) and for action
recognition, 2D trajectories were generated from 3D motion capture systems
(MOVI dataset). In order to interpret the symbols produced for data in each
experiment, gradient-weighted class activation mapping (Grad-CAM) methods were
used to identify pixel regions indicating semantic features which contribute
evidence towards symbols in learned languages. Additionally, a t-distributed
stochastic neighbor embedding (t-SNE) method was used to investigate embeddings
learned by CNN feature extractors. |
---|---|
DOI: | 10.48550/arxiv.2309.06335 |