Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
The ability to associate touch with other modalities has huge implications for humans and computational systems. However, multimodal learning with touch remains challenging due to the expensive data collection process and non-standardized sensor outputs. We introduce UniTouch, a unified tactile mode...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The ability to associate touch with other modalities has huge implications
for humans and computational systems. However, multimodal learning with touch
remains challenging due to the expensive data collection process and
non-standardized sensor outputs. We introduce UniTouch, a unified tactile model
for vision-based touch sensors connected to multiple modalities, including
vision, language, and sound. We achieve this by aligning our UniTouch
embeddings to pretrained image embeddings already associated with a variety of
other modalities. We further propose learnable sensor-specific tokens, allowing
the model to learn from a set of heterogeneous tactile sensors, all at the same
time. UniTouch is capable of conducting various touch sensing tasks in the
zero-shot setting, from robot grasping prediction to touch image question
answering. To the best of our knowledge, UniTouch is the first to demonstrate
such capabilities. Project page: https://cfeng16.github.io/UniTouch/ |
---|---|
DOI: | 10.48550/arxiv.2401.18084 |