DOTA: Distributional Test-Time Adaptation of Vision-Language Models
Vision-language foundation models (e.g., CLIP) have shown remarkable performance across a wide range of tasks. However, deploying these models may be unreliable when significant distribution gaps exist between the training and test data. The training-free test-time dynamic adapter (TDA) is a promisi...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Vision-language foundation models (e.g., CLIP) have shown remarkable
performance across a wide range of tasks. However, deploying these models may
be unreliable when significant distribution gaps exist between the training and
test data. The training-free test-time dynamic adapter (TDA) is a promising
approach to address this issue by storing representative test samples to guide
the classification of subsequent ones. However, TDA only naively maintains a
limited number of reference samples in the cache, leading to severe test-time
catastrophic forgetting when the cache is updated by dropping samples. In this
paper, we propose a simple yet effective method for DistributiOnal Test-time
Adaptation (Dota). Instead of naively memorizing representative test samples,
Dota continually estimates the distributions of test samples, allowing the
model to continually adapt to the deployment environment. The test-time
posterior probabilities are then computed using the estimated distributions
based on Bayes' theorem for adaptation purposes. To further enhance the
adaptability on the uncertain samples, we introduce a new human-in-the-loop
paradigm which identifies uncertain samples, collects human-feedback, and
incorporates it into the Dota framework. Extensive experiments validate that
Dota enables CLIP to continually learn, resulting in a significant improvement
compared to current state-of-the-art methods. |
---|---|
DOI: | 10.48550/arxiv.2409.19375 |