Higher Order Transformers: Efficient Attention Mechanism for Tensor Structured Data
Transformers are now ubiquitous for sequence modeling tasks, but their extension to multi-dimensional data remains a challenge due to the quadratic cost of the attention mechanism. In this paper, we propose Higher-Order Transformers (HOT), a novel architecture designed to efficiently process data wi...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Transformers are now ubiquitous for sequence modeling tasks, but their
extension to multi-dimensional data remains a challenge due to the quadratic
cost of the attention mechanism. In this paper, we propose Higher-Order
Transformers (HOT), a novel architecture designed to efficiently process data
with more than two axes, i.e. higher-order tensors. To address the
computational challenges associated with high-order tensor attention, we
introduce a novel Kronecker factorized attention mechanism that reduces the
attention cost to quadratic in each axis' dimension, rather than quadratic in
the total size of the input tensor. To further enhance efficiency, HOT
leverages kernelized attention, reducing the complexity to linear. This
strategy maintains the model's expressiveness while enabling scalable attention
computation. We validate the effectiveness of HOT on two high-dimensional
tasks, including multivariate time series forecasting, and 3D medical image
classification. Experimental results demonstrate that HOT achieves competitive
performance while significantly improving computational efficiency, showcasing
its potential for tackling a wide range of complex, multi-dimensional data. |
---|---|
DOI: | 10.48550/arxiv.2412.02919 |