M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs. To address these challenges, dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale data...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Training state-of-the-art (SOTA) deep models often requires extensive data,
resulting in substantial training and storage costs. To address these
challenges, dataset condensation has been developed to learn a small synthetic
set that preserves essential information from the original large-scale dataset.
Nowadays, optimization-oriented methods have been the primary method in the
field of dataset condensation for achieving SOTA results. However, the bi-level
optimization process hinders the practical application of such methods to
realistic and larger datasets. To enhance condensation efficiency, previous
works proposed Distribution-Matching (DM) as an alternative, which
significantly reduces the condensation cost. Nonetheless, current DM-based
methods still yield less comparable results to SOTA optimization-oriented
methods. In this paper, we argue that existing DM-based methods overlook the
higher-order alignment of the distributions, which may lead to sub-optimal
matching results. Inspired by this, we present a novel DM-based method named
M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy between
feature representations of the synthetic and real images. By embedding their
distributions in a reproducing kernel Hilbert space, we align all orders of
moments of the distributions of real and synthetic images, resulting in a more
generalized condensed set. Notably, our method even surpasses the SOTA
optimization-oriented method IDC on the high-resolution ImageNet dataset.
Extensive analysis is conducted to verify the effectiveness of the proposed
method. Source codes are available at https://github.com/Hansong-Zhang/M3D. |
---|---|
DOI: | 10.48550/arxiv.2312.15927 |