Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox
Long-tailed data distributions pose challenges for a variety of domains like e-commerce, finance, biomedical science, and cyber security, where the performance of machine learning models is often dominated by head categories while tail categories are inadequately learned. This work aims to provide a...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Long-tailed data distributions pose challenges for a variety of domains like
e-commerce, finance, biomedical science, and cyber security, where the
performance of machine learning models is often dominated by head categories
while tail categories are inadequately learned. This work aims to provide a
systematic view of long-tailed learning with regard to three pivotal angles:
(A1) the characterization of data long-tailedness, (A2) the data complexity of
various domains, and (A3) the heterogeneity of emerging tasks. We develop
HeroLT, a comprehensive long-tailed learning benchmark integrating 18
state-of-the-art algorithms, 10 evaluation metrics, and 17 real-world datasets
across 6 tasks and 4 data modalities. HeroLT with novel angles and extensive
experiments (315 in total) enables effective and fair evaluation of newly
proposed methods compared with existing baselines on varying dataset types.
Finally, we conclude by highlighting the significant applications of
long-tailed learning and identifying several promising future directions. For
accessibility and reproducibility, we open-source our benchmark HeroLT and
corresponding results at https://github.com/SSSKJ/HeroLT. |
---|---|
DOI: | 10.48550/arxiv.2307.08235 |