Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling
Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the possibility of real-time sample synthesis. Traditional accele...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diffusion Probability Models (DPMs) have made impressive advancements in
various machine learning domains. However, achieving high-quality synthetic
samples typically involves performing a large number of sampling steps, which
impedes the possibility of real-time sample synthesis. Traditional accelerated
sampling algorithms via knowledge distillation rely on pre-trained model
weights and discrete time step scenarios, necessitating additional training
sessions to achieve their goals. To address these issues, we propose the
Catch-Up Distillation (CUD), which encourages the current moment output of the
velocity estimation model ``catch up'' with its previous moment output.
Specifically, CUD adjusts the original Ordinary Differential Equation (ODE)
training objective to align the current moment output with both the ground
truth label and the previous moment output, utilizing Runge-Kutta-based
multi-step alignment distillation for precise ODE estimation while preventing
asynchronous updates. Furthermore, we investigate the design space for CUDs
under continuous time-step scenarios and analyze how to determine the suitable
strategies. To demonstrate CUD's effectiveness, we conduct thorough ablation
and comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, we
obtain a FID of 2.80 by sampling in 15 steps under one-session training and the
new state-of-the-art FID of 3.37 by sampling in one step with additional
training. This latter result necessitated only 620k iterations with a batch
size of 128, in contrast to Consistency Distillation, which demanded 2100k
iterations with a larger batch size of 256. Our code is released at
https://anonymous.4open.science/r/Catch-Up-Distillation-E31F. |
---|---|
DOI: | 10.48550/arxiv.2305.10769 |