Adaptive Locality Guidance: Using Locality Guidance to Initialize the Learning of Vision Transformers on Tiny Datasets
While we keep working toward leveraging the benefits of vision transformers (VTs) on small datasets, convolutional neural networks (CNNs) still remain the choice of preference when extensive training data is unavailable. As studies show that lack of sufficient data leads VTs to mainly learn global i...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2025-01, p.1-15 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While we keep working toward leveraging the benefits of vision transformers (VTs) on small datasets, convolutional neural networks (CNNs) still remain the choice of preference when extensive training data is unavailable. As studies show that lack of sufficient data leads VTs to mainly learn global information from the input, the recently proposed locality guidance (LG) approach uses a lightweight CNN pretrained on the same dataset to guide the VT into learning local features as well. Under a dual learning framework, the use of the LG significantly boosts the accuracy of different VTs on multiple tiny datasets, at the mere cost of a slight increase in training time. However, we also find that the use of the LG prevents the models from learning global aspects to their full ability, sometimes leading to worsened performances compared to the original baselines. In order to overcome this limitation, we propose the adaptive LG (ALG), an improved version which uses the LG as an initialization tool, and after a certain number of epochs lets the VT learn by itself in a supervised fashion. Specifically, we estimate the needed duration for the LG based on a threshold set on the evolution of the distance separating the features of the VT from those of the lightweight CNN used for guidance. Since our improved method can be used in a plug-and-play fashion, we successfully apply it across ten different VTs, and five different datasets. Experimental results show that the proposed ALG significantly reduces the computational cost added in training by the LG (by 37%\sim64%), and further increases the validation accuracy by up to 6.71%. |
---|---|
ISSN: | 2162-237X 2162-2388 |
DOI: | 10.1109/TNNLS.2024.3515076 |