IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors
Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Knowledge distillation (KD) has been proven to be useful for training compact
object detection models. However, we observe that KD is often effective when
the teacher model and student counterpart share similar proposal information.
This explains why existing KD methods are less effective for 1-bit detectors,
caused by a significant information discrepancy between the real-valued teacher
and the 1-bit student. This paper presents an Information Discrepancy-aware
strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate
information discrepancies and significantly reduce the performance gap between
a 1-bit detector and its real-valued counterpart. We formulate the distillation
process as a bi-level optimization formulation. At the inner level, we select
the representative proposals with maximum information discrepancy. We then
introduce a novel entropy distillation loss to reduce the disparity based on
the selected proposals. Extensive experiments demonstrate IDa-Det's superiority
over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and
COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with
ResNet-18 backbone. Our code is open-sourced on
https://github.com/SteveTsui/IDa-Det. |
---|---|
DOI: | 10.48550/arxiv.2210.03477 |