Precise Knowledge Transfer via Flow Matching
In this paper, we propose a novel knowledge transfer framework that introduces continuous normalizing flows for progressive knowledge transformation and leverages multi-step sampling strategies to achieve precision knowledge transfer. We name this framework Knowledge Transfer with Flow Matching (FM-...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we propose a novel knowledge transfer framework that
introduces continuous normalizing flows for progressive knowledge
transformation and leverages multi-step sampling strategies to achieve
precision knowledge transfer. We name this framework Knowledge Transfer with
Flow Matching (FM-KT), which can be integrated with a metric-based distillation
method with any form (\textit{e.g.} vanilla KD, DKD, PKD and DIST) and a
meta-encoder with any available architecture (\textit{e.g.} CNN, MLP and
Transformer). By introducing stochastic interpolants, FM-KD is readily amenable
to arbitrary noise schedules (\textit{e.g.}, VP-ODE, VE-ODE, Rectified flow)
for normalized flow path estimation. We theoretically demonstrate that the
training objective of FM-KT is equivalent to minimizing the upper bound of the
teacher feature map or logit negative log-likelihood. Besides, FM-KT can be
viewed as a unique implicit ensemble method that leads to performance gains. By
slightly modifying the FM-KT framework, FM-KT can also be transformed into an
online distillation framework OFM-KT with desirable performance gains. Through
extensive experiments on CIFAR-100, ImageNet-1k, and MS-COCO datasets, we
empirically validate the scalability and state-of-the-art performance of our
proposed methods among relevant comparison approaches. |
---|---|
DOI: | 10.48550/arxiv.2402.02012 |