TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network that is configured to process an input image to generate a network output for the input image. In one aspect, a method comprises, at each of a plurality of training steps:...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Beyer, Lucas Klaus, Zhai, Xiaohua, Kolesnikov, Alexander, Tschannen, Michael Tobias, Kornblith, Simon, Caron, Mathilde, Minderer, Matthias Johannes Lorenz, Alabdulmohsin, Ibrahim, Izmailov, Pavel, Pavetic, Filip
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network that is configured to process an input image to generate a network output for the input image. In one aspect, a method comprises, at each of a plurality of training steps: obtaining a plurality of training images for the training step; obtaining, for each of the plurality of training images, a respective target output; and selecting, from a plurality of image patch generation schemes, an image patch generation scheme for the training step, wherein, given an input image, each of the plurality of image patch generation schemes generates a different number of patches of the input image, and wherein each patch comprises a respective subset of the pixels of the input image.