Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of pro...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We challenge a common assumption underlying most supervised deep learning:
that a model makes a prediction depending only on its parameters and the
features of a single input. To this end, we introduce a general-purpose deep
learning architecture that takes as input the entire dataset instead of
processing one datapoint at a time. Our approach uses self-attention to reason
about relationships between datapoints explicitly, which can be seen as
realizing non-parametric models using parametric attention mechanisms. However,
unlike conventional non-parametric models, we let the model learn end-to-end
from the data how to make use of other datapoints for prediction. Empirically,
our models solve cross-datapoint lookup and complex reasoning tasks unsolvable
by traditional deep learning models. We show highly competitive results on
tabular data, early results on CIFAR-10, and give insight into how the model
makes use of the interactions between points. |
---|---|
DOI: | 10.48550/arxiv.2106.02584 |