ATTENTION NEURAL NETWORKS WITH SPARSE ATTENTION MECHANISMS

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Guruganesh, Guru, Ahmed, Amr, Pham, Philip, Ainslie, Joshua Timothy, Ontañón, Santiago, Zaheer, Manzil, Dubey, Kumar Avinava
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing network inputs using an attention neural network that has one or more sparse attention sub-layers. Each sparse attention sub-layer is configured to apply a sparse attention mechanism that attends differently for input positions that are in a first proper subset of the input positions in the input to the sub-layer than for positions that are not in the first proper subset.