Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity
Vision-language pre-training (VLP) has emerged as an effective scheme for multimodal representation learning, but its reliance on large-scale multimodal data poses significant challenges for medical applications. Federated learning (FL) offers a promising solution to scale up the dataset for medical...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Vision-language pre-training (VLP) has emerged as an effective scheme for
multimodal representation learning, but its reliance on large-scale multimodal
data poses significant challenges for medical applications. Federated learning
(FL) offers a promising solution to scale up the dataset for medical VLP while
preserving data privacy. However, we observe that client data heterogeneity in
real-world scenarios could cause models to learn biased cross-modal alignment
during local pre-training. This would limit the transferability of the
federally learned representation model on downstream tasks. To address this
challenge, we propose Federated Distributionally Robust Alignment (FedDRA), a
framework for federated VLP that achieves robust vision-language alignment
under heterogeneous conditions. Based on client datasets, we construct a
distribution family that encompasses potential test-time domains, and apply a
distributionally robust framework to optimize the pre-trained model's
performance across this distribution space. This approach bridges the gap
between pre-training samples and downstream applications. To avoid over-fitting
on client-specific information, we use anchor representation from the global
model to guide the local training, and adopt a two-stage approach to first tune
deeper layers before updating the entire network. Extensive experiments on
real-world datasets demonstrate FedDRA's effectiveness in enhancing medical
federated VLP under data heterogeneity. Our method also adapts well to various
medical pre-training methods. |
---|---|
DOI: | 10.48550/arxiv.2404.03854 |