Neural Turing Machines: Convergence of Copy Tasks

The architecture of neural Turing machines is differentiable end to end and is trainable with gradient descent methods. Due to their large unfolded depth Neural Turing Machines are hard to train and because of their linear access of complete memory they do not scale. Other architectures have been st...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2016-12
1. Verfasser: Janez Aleš
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The architecture of neural Turing machines is differentiable end to end and is trainable with gradient descent methods. Due to their large unfolded depth Neural Turing Machines are hard to train and because of their linear access of complete memory they do not scale. Other architectures have been studied to overcome these difficulties. In this report we focus on improving the quality of prediction of the original linear memory architecture on copy and repeat copy tasks. Copy task predictions on sequences of length six times larger than those the neural Turing machine was trained on prove to be highly accurate and so do predictions of repeat copy tasks for sequences with twice the repetition number and twice the sequence length neural Turing machine was trained on.
ISSN:2331-8422