UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images
Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporat...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Convolutional neural network (CNN) approaches available in the current
literature are designed to work primarily with low-resolution images. When
applied on very large images, challenges related to GPU memory, smaller
receptive field than needed for semantic correspondence and the need to
incorporate multi-scale features arise. The resolution of input images can be
reduced, however, with significant loss of critical information. Based on the
outlined issues, we introduce a novel research problem of training CNN models
for very large images, and present 'UltraMNIST dataset', a simple yet
representative benchmark dataset for this task. UltraMNIST has been designed
using the popular MNIST digits with additional levels of complexity added to
replicate well the challenges of real-world problems. We present two variants
of the problem: 'UltraMNIST classification' and 'Budget-aware UltraMNIST
classification'. The standard UltraMNIST classification benchmark is intended
to facilitate the development of novel CNN training methods that make the
effective use of the best available GPU resources. The budget-aware variant is
intended to promote development of methods that work under constrained GPU
memory. For the development of competitive solutions, we present several
baseline models for the standard benchmark and its budget-aware variant. We
study the effect of reducing resolution on the performance and present results
for baseline models involving pretrained backbones from among the popular
state-of-the-art models. Finally, with the presented benchmark dataset and the
baselines, we hope to pave the ground for a new generation of CNN methods
suitable for handling large images in an efficient and resource-light manner. |
---|---|
DOI: | 10.48550/arxiv.2206.12681 |