Efficient Beam Tree Recursion
Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind,...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a
simple extension of Gumbel Tree RvNN and it was shown to achieve
state-of-the-art length generalization performance in ListOps while maintaining
comparable performance on other tasks. However, although not the worst in its
kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this
paper, we identify the main bottleneck in BT-RvNN's memory usage to be the
entanglement of the scorer function and the recursive cell function. We propose
strategies to remove this bottleneck and further simplify its memory usage.
Overall, our strategies not only reduce the memory usage of BT-RvNN by
$10$-$16$ times but also create a new state-of-the-art in ListOps while
maintaining similar performance in other tasks. In addition, we also propose a
strategy to utilize the induced latent-tree node representations produced by
BT-RvNN to turn BT-RvNN from a sentence encoder of the form $f:\mathbb{R}^{n
\times d} \rightarrow \mathbb{R}^{d}$ into a sequence contextualizer of the
form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$. Thus, our
proposals not only open up a path for further scalability of RvNNs but also
standardize a way to use BT-RvNNs as another building block in the deep
learning toolkit that can be easily stacked or interfaced with other popular
models such as Transformers and Structured State Space models. |
---|---|
DOI: | 10.48550/arxiv.2307.10779 |