Training large scale visual transducer neural network with variable tile size

The invention relates to training a large-scale visual transducer neural network with variable tile size. A method of training the neural network includes, at each training step: obtaining a plurality of training images; obtaining a corresponding target output of each training image; selecting an im...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHANEN MICHAEL TOBIAS, KORNBLISS SIMON, ZHAI XIAOHUA, AL-ABDUL MOHSEN IBRAHIM, MINDERER MATTHIAS JOHANNES LORENZ, KOLESNIKOV ALEXANDER, BAYER LUKAS KLAUS, CARON MATHILDE, PAVTIC, PHILIPPE, ISMAILOV PAVEL
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to training a large-scale visual transducer neural network with variable tile size. A method of training the neural network includes, at each training step: obtaining a plurality of training images; obtaining a corresponding target output of each training image; selecting an image tile generation scheme from a plurality of image tile generation schemes, where each image tile generation scheme generates a different number of tiles of the given input image, and where each tile includes a respective subset of pixels of the given input image; for each training image: generating a plurality of image tiles by applying the selected image tile generation scheme to the training image; and processing the plurality of image tiles using a neural network to generate a network output; and training the neural network on an objective function that measures, for each training image, a difference between a network output of the training image and a target network output of the training image. 本公开涉及训练具有可变图