Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration
In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing artificial intellige...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In our increasingly interconnected world, where intelligent devices
continually amass copious personalized multi-modal data, a pressing need arises
to deliver high-quality, personalized device-aware services. However, this
endeavor presents a multifaceted challenge to prevailing artificial
intelligence (AI) systems primarily rooted in the cloud. As these systems
grapple with shifting data distributions between the cloud and devices, the
traditional approach of fine-tuning-based adaptation (FTA) exists the following
issues: the costly and time-consuming data annotation required by FTA and the
looming risk of model overfitting. To surmount these challenges, we introduce a
Universal On-Device Multi-modal Model Adaptation Framework, revolutionizing
on-device model adaptation by striking a balance between efficiency and
effectiveness. The framework features the Fast Domain Adaptor (FDA) hosted in
the cloud, providing tailored parameters for the Lightweight Multi-modal Model
on devices. To enhance adaptability across multi-modal tasks, the AnchorFrame
Distribution Reasoner (ADR) minimizes communication costs. Our contributions,
encapsulated in the Cloud-Device Collaboration Multi-modal Parameter Generation
(CDC-MMPG) framework, represent a pioneering solution for on-Device Multi-modal
Model Adaptation (DMMA). Extensive experiments validate the efficiency and
effectiveness of our method, particularly in video question answering and
retrieval tasks, driving forward the integration of intelligent devices into
our daily lives. |
---|---|
DOI: | 10.48550/arxiv.2406.01601 |