Multimodal Framework for Long-Tailed Recognition
Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used...
Gespeichert in:
Veröffentlicht in: | Applied sciences 2024-11, Vol.14 (22), p.10572 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Long-tailed data distribution (i.e., minority classes occupy most of the data, while most classes have very few samples) is a common problem in image classification. In this paper, we propose a novel multimodal framework for long-tailed data recognition. In the first stage, long-tailed data are used for visual-semantic contrastive learning to obtain good features, while in the second stage, class-balanced data are used for classifier training. The proposed framework leverages the advantages of multimodal models and mitigates the problem of class imbalance in long-tailed data recognition. Experimental results demonstrate that the proposed framework achieves competitive performance on the CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018 datasets for image classification. |
---|---|
ISSN: | 2076-3417 2076-3417 |
DOI: | 10.3390/app142210572 |