Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection
This paper presents Incremental Vision-Language Object Detection (IVLOD), a novel learning task designed to incrementally adapt pre-trained Vision-Language Object Detection Models (VLODMs) to various specialized domains, while simultaneously preserving their zero-shot generalization capabilities for...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents Incremental Vision-Language Object Detection (IVLOD), a
novel learning task designed to incrementally adapt pre-trained Vision-Language
Object Detection Models (VLODMs) to various specialized domains, while
simultaneously preserving their zero-shot generalization capabilities for the
generalized domain. To address this new challenge, we present the
Zero-interference Reparameterizable Adaptation (ZiRa), a novel method that
introduces Zero-interference Loss and reparameterization techniques to tackle
IVLOD without incurring additional inference costs or a significant increase in
memory usage. Comprehensive experiments on COCO and ODinW-13 datasets
demonstrate that ZiRa effectively safeguards the zero-shot generalization
ability of VLODMs while continuously adapting to new tasks. Specifically, after
training on ODinW-13 datasets, ZiRa exhibits superior performance compared to
CL-DETR and iDETR, boosting zero-shot generalizability by substantial 13.91 and
8.74 AP, respectively.Our code is available at
https://github.com/JarintotionDin/ZiRaGroundingDINO. |
---|---|
DOI: | 10.48550/arxiv.2403.01680 |